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(S4) Abstract Title 

Computer conferencing apparatus 



(57) In a desktop video conferencing system, a plurality of user stations 2-14 are interconnected to exchange 
data. Each user station 2-14 stores data defining a respective, different three-dimensional computer model 
containing avatars of the participants in the conference, and displays images of the model to a user. Each user 
station uses a pair of cameras 26, 28 to record images of the user wearing coloured body markers 70, 72 and a 
headset 30 on which lights 56-64 are mounted. The image data is processed to determine the movements of 
the user and the point in the displayed images at which the user is looking. This Information is transmitted to 
the other user stations where it is used to animate the corresponding avatar. Each avatar is animated so that 
movements of its head correspond to scaled movements of the user's head. Because the computer model at 
each user station is different, the movement of the head of a given avatar Is different at each user station. 
Image data Is processed in each user station to determine the type of transformation which relates the 
positions of the camera 26, 28 to be used in determining the user's movements. The sound of the other 
participants Is output to the headset 30 worn by a user. The sound is adjusted in dependence upon the position 
of the user's head. 

FIG. 3 



Q. 

8 

I 

co 
m 

CO 



9 



\3 



USEAMPt/T 

oev»CE(8i 



M20 



MEUOAY 

77 



3«,3a 



AVATAR AND 30 



MCX)CL 
PROCESSOR 



2~ 



9- 



MPEG 4 
0ECO06R 



BITSTR£AM2 


BTTSTRI 


SAMir 



MPUT/ 

ooTPur 

INTtRPACe 



FACBP 



locooMmATn 



VMeWPARMICIER 



CD 

ro 

CO 
CJl 



> 



1/39 




CO 

CD 



.5/39 



LU 

^- y 
p => < 



CM 
CO 



\ 



CO 
CO 

CD 
CO 



— o 

UJ UJ 
CO o 
3 




i- 

CO 
CQ 



cm 

CO 

t: 

CD 



K 



UJ 

UJ O 
CL a 



CD 



LU CO 
Q UJ 

o o 
2 O 

ex. 



1/ 



T 



< q: CO 



q: UJ 

< U- UJ 

h- Z Q 

^ o o 



a: 

ii 

— UJ 



CO 



/ 



CO 

o 



\ 



o 

UJ 
CL 



UJ 
Q 

o 
o 
z: 

UJ 



_J 

UJ 

X 
Ol 
UJ 

o 
< 



UJ 

—I 
—I 

o 

or 



UJ 

O 



O 

o 



1 




FER 




u_ 






u. 


CO 



o 

CNJ 



CD CO 



< 

C!) 
O 

a: 
a. 



o 
I- 

CO 
2 



O 

o 



o 



N 



a: 
o 

UJ 



K 

CN 
O 



CO 


a: 


UJ 


Ul 




1- 




UJ 


•z. 




O 


< 


COOR 


WPAF 


Q 


UJ 


CO 


> 



CD 
O 



< ca 

< CO 
Q CO 
UJUJ 



Q 
Z 

O 
CO 



a: 
o 

% 

UJ 

z 

UJ 
CD 



CN 

CM 



00 
CM 



\ 



o 

lO 



1 





>- 




a: 




o 








UJ 




2 



CD 
CM 




6/39 



ESTABLISH CONNECTIONS 


^S2 




r 






SET UP CONFERENCE 








f 






CALIBRATE EACH USER 
STATION 




^S6 




f 






CARRY OUT CONFERENCE 




^S8 



FIG 



7/39 



i 



REQUEST AND STORE NAME OF 
EACH PARTICIPANT 



REQUEST AND STORE AVATAR 
FOR EACH PARTICIPANT 



DEFINE SEATING PLAN 



SELECT CONFERENCE TABLE 
SHAPE 



SEND AVATARS. SEATING PL^N, 
TABLE SHAPE AND PARTICIPANT 
NAMES TO EACH PARTICIPANT 



S20 



S22 



S24 



S26 



S28 



FIG. 5 



1: MR A 



7: MRG 



6: MRF 




MRB 



3: MRC 



5: MISS E 



4: MRD 



FIG. 6 



8/39 



i 



RECEIVE AND STORE 
INFORMATION FROM 
CONFERENCE COORDINATOR 



INSTRUCT USER TO INPUT 
CAMERA PARAMETERS 



S40 



S42 



STORE CAMERA PARAMETERS 



S44 



INSTRUCT USER TO INPUT 
SCREEN WIDTH 



S46 



STORE SCREEN WIDTH 



S48 



INSTRUCT USER TO WEAR 
HEADSET AND BODY MARKERS 



INSTRUCT USER TO POSITION 
MOVEABLE LEDs ON HEADSET 
TO ALIGN WITH USER'S EYES 



INSTRUCT USER TO POSITION 
CAMERAS SO THAT BOTH HAVE 
A FIELD OF VIEW WHICH 
COVERS THE USER POSITION 



INSTRUCT USER TO MOVE TO 
EXTREME POSITIONS TO EACH 
SIDE AND BACKWARDS AND 
FORWARDS 



RECORD AND DISPLAY FRAMES 
OF IMAGE DATA WITH EACH 
CAMERA AS USER MOVES TO 
EXTREME POSITIONS 



S49 



S50 



S52 



S54 



S56 



FIG, 7 



■• - 9/39 




® 



YES 



DETERMINE USER'S HEAD SIZE 
AND RATIO FROM AVATAR 



DETERMINE CAMERA 
TRANSFORMATION MODEL 
TO BE USED DURING VIDEO 
CONFERENCE 



DETERMINE POSITION OF 
HEADSET LEDS RELATIVE TO 
USER'S HEAD 



DETERMINE POSITION OF 
DISPLAY SCREEN AND SET 
UP STANDARD COORDINATE 
SYSTEM 



SET UP CONFERENCE ROOM 
TABLE AND NAME LABELS IN 3D 
MODEL 



DEFINE TRANSFORMATION FOR 
EACH PARTICIPANT MAPPING 
THE PARTICIPANTS AVATAR 

INTO THE CONFERENCE ROOM 
3D MODEL 



STORE DATA INTO CACHE 
DEFINING RELATIONSHIP 
BETWEEN HORIZONTAL SCREEN 
POSITION AND CONFERENCE 
PARTICIPANTS 



SEND "READY" SIGNAL TO 
CONFERENCE COORDINATOR 



T 



S60 



S62 



S64 



S66 



S68 



S70 



S72 



S74 



FIG. 7 (cont) 



10/39 



IDENTIFY IMAGE FROM EACH 
CAMERA CORRESPONDING TO 
USER'S MOST LEFT POSITION, 
MOST RIGHT POSITION. MOST 
FORWARD POSITION AND MOST 
BACKWARD POSITION 



S90 



MATCH POINTS IN EACH PAIR OF 
CORRESPONDING IMAGES 



S92 



NORMALISE MATCHED POINTS IN 
EACH PAIR OF IMAGES 



S94 



GROUP MATCHED POINTS FROM 
ALL IMAGES INTO A COMBINED 

SET 




SET UP MEASUREMENT MATRIX 
FOR THE COMBINED SET 



S98 



CALCULATE MOST 
ACCURATE CAMERA 
TRANSFORMATION USING 
THE POINTS IN THE 
COMBINED SET 




FIG. 8 



11/39 



1 



SELECT THE PERSPECTIVE 
TRANSFORMATION FOR USE 
DURING VIDEO CONFERENCE 



S106 



1 



SELECT THE AFFINE 
TRANSFORMATION FOR USE 
DURING VIDEO CONFERENCE 



S110 



CONVERT THE PHYSICAL 
FUNDAMENTAL MATRIX TO 
CAMERA ROTATION MATRIX AND 
TRANSLATION VECTOR 



S108 



CONVERT THE AFFINE 
FUNDAMENTAL MATRIX INTO 
PHYSICAL VARIABLES 
DESCRIBING CAMERA 
TRANSFORMATION 



S112 



FIG. 8 (cont) 



i 



PERFORM PERSPECTIVE 
CALCULATION AND STORE 
RESULTS 



SI 30 



PERFORM AFFINE 
CALCULATION AND STORE 
RESULTS 



S132 



SELECT MOST ACCURATE 
TRANSFORMATION 



T 



S134 



FIG. 9 



12/39 



SELECT NEXT 7 PAIRS OF MATCHED 
POINTS 



CALCULATE FUNDAMENTAL MATRIX 



CONVERT FUNDAMENTAL MATRIX TO 
PHYSICAL FUNDAMENTAL MATRIX 



TEST PHYSICAL FUNDAMENTAL 
MATRIX AGAINST THE 7 PAIRS OF 
POINTS USED TO CALCULATE 
FUNDAMENTAL MATRIX 



NO 



S148 

Is PHYSICAL^ 
FUNDAMENTAL MATRIX 
SUFFICIENTLY 
ACCURATE? 

YES 



S140 
S142 
S144 

S146 



NO 



TEST PHYSICAL FUNDAMENTAL 
MATRIX AGAINST ALL OTHER 
PAIRS OF MATCHED POINTS 



S152 



IS PHYSICAL 
FUNDAMENTAL MATRIX 

MORE ACCURATE 
THAN ANY PREVIOUSLY 
CALCULATED? 



SI 50 




13/39 




SI 70 



CALCULATE TANGENT PLANE OF 
SURFACE REPRESENTING 
PHYSICAL FUNDAMENTAL 
MATRIX AT 4D POINT DEFINED 
BY COORDINATES OF NEXT PAIR 
OF MATCHED POINTS 



S172 





f 


CALCULATE NORMAL TO 
TANGENT PLANE 




f 



CALCULATE DISTANCE ALONG 
NORMAL FROM 4D POINT TO 
SURFACE REPRESENTING 
PHYSCIAL FUNDAMENTAL 
MATRIX 



S178 




INCREMENT COUNTER. STORE 
POINTS. STORE DISTANCE 



S174 



S176 



S180 



ANOTHER 
PAIR OF MATCHED 
POINTS? 



S182 



YES 



FIG. 



14/39 



SELECT NEXT 4 PAIRS OF 
MATCHED POINTS 


^S200 




r 




CALCULATE 4 COMPONENTS OF 
FUNDAMENTAL MATRIX 


^S202 




r 




TEST AFFINE FUNDAMENTAL 
MATRIX AGAINST EACH PAIR OF 
MATCHED POINTS 


^S204 


^^..^^"^-^^^^ S206 

^---^is affine ^^-..^ 
.-^--'Fundamental matrix^""-^.^^^ 
more accurate than 

any previously ^^^^^ 

^•"•-•-..•.^.calculated?^^ 




[yes 




STORE AFFINE FUNDAMENTAL 

MATRIX TOGETHER WITH 
CONSISTENT POINTS, NUMBER 
OF POINTS AND MATRIX TOTAL 
ERROR 


^S208 




YES 



FIG. 12 



,-,,.,,,•.1.15(39 




FIG. 13 



i 



INSTRUCT USER TO LOOK 
DIRECTLY AT CAMERA TO HIS 
RIGHT 


^S230 




r 




RECORD FRAME OF IMAGE DATA 
WITH BOTH CAMERAS 


^S232 




r 






CALCULATE 3D POSITIONS 
OF THE HEADSET LEDs 


^S234 




r 





DETERMINE ANGLE BETWEEN 
PLANE PASSING THROUGH THE 
HEADSET LEDs IN 3D SPACE AND ^ S236 
THE IMAGING PLANE OF THE 
CAMERA AT WHICH THE USER 
WAS LOOKING 

FIG. 14 



16/39 



i 



DETERMINE POSITION IN EACH ^ S250 
IMAGE OF THE HEADSET LEDs 



PROJECT LINE IN 3 DIMENSIONS 
FROM NEXT LED POSITION IN 
EACH OF THE IMAGES 



S252 



CALCULATE MID-POINT OF LINE 
WHICH CONNECTS. AND IS 
PERPENDICULAR TO, BOTH 
PROJECTED LINES 



S254 




17/39 



i 



INSTRUCT USER TO SIT UPRIGHT 
CENTRALLY IN FRONT OF. AND 
PARALLEL TO, THE DISPLAY 
SCREEN WITH HIS TORSO 
TOUCHING THE EDGE OF THE 
DESK 



1 


r 


INSTRUCT USER TO TURN, BUT 

NOT OTHERWISE CHANGE 
POSITION OF. HEAD FOR THE 
FOLLOWING STEPS 




r 




DETERMINE DIRECTION OF 
PLANE OF DISPLAY SCREEN 




r 




DETERMINE 3D POSITION OF 
PLANE OF DISPLAY SCREEN 


1 


r 



SET UP PREDEFINED 
COORDINATE SYSTEM WITH 
STANDARD SCALE, AND 
CALCULATE TRANSFORMATION 
TO THIS COORDINATE SYSTEM 





r 


CALCULATE 3D POSITIONS OF 
TORSO MARKERS IN STANDARD 
COORDINATE SYSTEM 




r 


TRANSMIT 3D POSITIONS OF 
TORSO MARKERS TO OTHER 
PARTICIPANTS 



r 



S270 



S272 



S274 



S276 



S278 



S280 



S282 



FIG. 



18/39 



i 



DISPLAY MARKER IN CENTRE OF 
SCREEN AND INSTRUCT USER 
TO LOOK AT THE MARKER 


^S300 




r 






RECORD FRAME OF IMAGE DATA 
WITH BOTH CAMERAS 


^ 55302 




r 








CALCULATE 3D POSITIONS 
OF TORSO MARKERS 




^S304 




r 








CALCULATE 3D POSITIONS 
OF THE HEADSET LEDs 




^S306 




r 






DETERMINE Pb^NE OF HEADSET 
LEDs 


^S308 




f 


ADJUST PLANE BY HEADSET 
OFFSET ANGLE TO GIVE PLANE 
PARALLEL TO DISPLAY SCREEN 


^S310 



FIG. 



19/39 



I 



DISPLAY MARKER AT RIGHT 

EDGE OF SCREEN AND 
INSTRUCT USER TO LOOK AT 
THE MARKER 



S320 



RECORD FRAME OF IMAGE DATA 
WITH BOTH CAMERAS 



S322 



DETERMINE ANGLE OF 
USER'S HEAD RELATIVE TO 
THE DISPLAY SCREEN 



S324 



CALCULATE AND STORE 3D 
POSITION OF THE DISPLAY 
SCREEN 



T 



S326 



FIG. 19 



i 



CALCULATE 3D POSITIONS 
OF THE HEADSET LEDs 



S340 



DETERMINE PLANE OF HEADSET 
LEDs 



S342 



ADJUST PLANE BY HEADSET 
OFFSET ANGLE TO GIVE HEAD 
PLANE 



S344 



DETERMINE ANGLE BETWEEN 

HEAD PLANE AND PLANE 
PARALLEL TO DISPLAY SCREEN 



T 



S346 



FIG. 20 



20/39 



FIG. 21 



, DISPLAYED MARKER 





^ 7 


DISPLAY SCREEN 




/ 






/ 




d 


/ 

/ 

/ 

/ 








PI ANF PARAIIFI TO 


(^J^ DISPLAY SCREEN 



PLANE OF 
USER'S HEAD 




F/G. 22 



21/39 




22/Z9 





23/39 




FIG. 24 



H M 



24/39 




CM 

CD 

u: 



25/39 



i 



PROCESS IMAGE DATA FROM 
SYNCHRONISED FRAMES 
(ONE FROM EACH CAMERA) 

TO CALCULATE 3D 
POSITIONS OF THE HEADSET 

LED-i AND BODY MARKERS 
WHICH ARE VISIBLE IN BOTH 
IMAGES 



S390 



DETERMINE PLANE OF USER'S 
HEAD 



PROJECT LINE PERPENDICULAR 
TO PLANE OF USER'S HEAD AND 
DETERMINE INTERSECTION 
WITH DISPLAY SCREEN 



DETERMINE CAMERA WHICH HAS 
IMAGING PLANE MOST PARALLEL 
TO THE PLANE OF THE USER'S 
HEAD 



EXTRACT PIXEL DATA FOR THE 
USER'S FACE 



TRANSFORM 3D COORDINATES 

OF BODY MARKERS INTO 
STANDARDISED COORDINATE 
SYSTEM 



ENCODE VIEW PARAMETER. 3D 
COORDINATES OF BODY 
MARKERS. AND FACE PIXEL 
DATA AND TRANSMIT TO OTHER 
PARTICIPANTS 



S392 



S394 




S396 



S398 



S400 



S401 



T 



S402 



FIG. 26 



26/39 




27/39 




28/39 



i 



AWAIT FURTHER DATA FROM 
PARTICIPANT 



S420 



READ NEXT SET OF DATA 



S422 



CHANGE POSITION OF AVATAR 

BODY AND ARMS TO FIT 
RECEIVED 3D COORDINATES OF 
BODY MARKERS 



S424 



TEXTURE MAP FACE PIXEL DATA 
ONTO THE AVATAR'S FACE 



S426 



TRANSFORM AVATAR INTO 3D 
CONFERENCE MODEL 



S428 



CHANGE POSITION OF AVATAR'S 
HEAD IN 3D CONFERENCE 
MODEL IN DEPENDENCE UPON 
PARTICIPANTS VIEW 
PARAMETER 



S430 



FIG. 28 



29/39 



I 

5 6 7 1-34- 
\ J — I — ' 

/ 

/ FIG. 29 A 

/ 

01 / 




Fl 0.293 



30/39 




. •3l/3d 



i 



RENDER IMAGE OF 
CONFERENCE ROOM MODEL TO 
GENERATE PIXEL DATA 



READ CURRENT VIEW 
PARAMETER OF USER 



AMEND IMAGE DATA WITH 
MARKER TO SHOW POSITION AT 
WHICH USER IS DETERMINED TO 
BE LOOKING 



S450 



S452 



S454 




S456 



FIG. 30 




200 



FIG. 31 



32/39 



i 



DECODE INPUT SOUND 
STREAMS 



S468 



READ CURRENT HEAD POSITION 
AND ORIENTATION FOR EACH 

AVATAR TO DETERMINE 
DIRECTION FOR EACH SOUND 
STREAM 



READ USER'S CURRENT HEAD 
POSITION AND ORIENTATION TO 
DETERMINE OUTPUT DIRECTION 



S472 



INPUT SOUND STREAMS AND 
INPUT AND OUTPUT DIRECTIONS 
TO SOUND GENERATOR AND 
PROCESS TO GENERATE LEFT 
AND RIGHT OUTPUT SIGNALS 



T 



FIG. 



33/39 




S500 



CALCULATE POSSIBLE 
CONFIGURATIONS OF THE 
AVATARS IN THE CONFERENCE 
ROOM 3D MODEL 



SELECT NEXT CALCULATED 
CONFIGURATION 



S502 



S504 



COMPUTE SMALLEST APPARENT 
TURN MOVEMENT BETWEEN ANY 
TWO AVATARS IN THE SELECTED 
CONFIGURATION 



S506 



ISTHE 
COMPUTED 
IwiOVEMENT LARGER THAN THE 
CURRENTLY STORED 
MOVEMENT?, 



YES 



REPLACE CURRENTLY STORED 
MOVEMENT WITH NEWLY 
COMPUTED MOVEMENT AND 
STORE THE SELECTED AVATAR 
CONFIGURATION 



S510 




SELECT STORED AVATAR 
CONFIGURATION AS THE 
CONFIGURATION TO BE USED 
FOR THE VIDEO CONFERENCE 



S514 



FIG. 33 



34/39 



READ THE VALUE OF THE 
NUMBER OF AVATARS TO BE 
DISPLAYED 





r 


CALCULATE THE AVERAGE 
DISTANCE OF THE USER FROM 
THE DISPLAY SCREEN 




r 


CALCULATE THE SCREEN 
HALF-WIDTH 




r 



CALCULATE THE AVERAGE 
DISTANCE OF THE USER FROM 
THE DISPLAY SCREEN AS A 
MULTIPLE OF THE SCREEN 
HALF-WIDTH 



READ THE MINIMUM DISPLAY 
SIZE VALUE FOR THE SMALLEST 
AVATAR(S) 



CALCULATE THE MAXIMUM 
DISTANCE ALONG THE Z-AXIS 
FOR THE SMALLEST AVATAR(S) 



READ THE MAXIMUM DISPLAY 
SIZE VALUE FOR THE SMALLEST 

AVATAR(S) 



CALCULATE THE MINIMUM 
DISTANCE ALONG THE Z-AXIS 
FOR THE SMALLEST AVATAR(S) 



READ THE Z-AXIS RESOLUTION 
VALUE TO BE USED FOR 
CALCULATING AVATAR 
CONFIGURATIONS 



T 



S520 



S522 



S524 



S526 



S528 



S530 



S532 



S534 



S536 



FIG 



(-k. 0) 
P* 



35/39 



t 



510 



520 



(0.1) 
(OJ^ 
(0.0) 



530 (0 . -1/2) 

500 



REAL WORLD 



FIG. 35A 



(0.-1) 



P2 



P3 



(Z3. X3) 
P4 



(Z4.O) 



(25 



P5 



P6 



^ 4 3D COMPUTER MODEL- 



t 



(0 . -1.) 



(Z3.X3) 




REAL WORLD 



4P5 

» ^ 3D COMPUTER MODEL 



FIG. 35B 



36/39 




FIG. 36A 



P6 



P2 
670- 



660 



650 



640 



FIG. 36B 



P5 



P6 



37/39 



P2 



P3 



I lU 



700 



690' 



P4 



680 



P5 



P6 



FIG. 36C 



i39/39 



^ T-J 


Distance of viewer from display, measured as a proportion of the display 

half- width 


Number ( 
avata]:s to 
displayei 


"1 
2: (-2, 0) 


3: (-3, 0) 1 


1 

4: (-4, 0) 


5 : ( - 5 , 0 ) 


3 


0 1 
0.78 0 
0 -1 


0 1 
0.82 0 
0 -1 


0 1 
0.86 0 
0 -1 


0 1 
0.88 0 
0 -1 


4 


0 1 
1.04 0.50667 
1.04 -0.50667 
0 -1 


0 1 

1 0.44444 

1 -0.44444 
0 -1 


0 1 

1.12 0.42657 
1.12 -0.42667 

0 -1 


0 1 

1.14 0.40933 

1 14 -0.40933 
0 -1 


5 1 


0 1 
1.08 0.77 
1.46 0 
1.08 -0.77 

0 -1 ! 


0 1 

1.04 0.67333 

1.36 0 

1.04 -0.67333 
0 -1 


0 1 
1.02 0.6275 
1.28 0 

1.02 -0.6275 
0 -1 


0 1 
0.9 0.59 
1.12 0 
0.9 -0.59 
0 -1 


6 


0 1 
1.02 0 .906 
1.56 0.356 
1.56 -0.356 
1.02 -0 .906 
0 -1 


0 1 
[0.9 0.78 
1.32 0.288 
1 .32 -0.288 
0.9 -0.78 
0 -1 


0 1 
0.94 0.741 
1.3 0.265 
1.3 -0.265 
j 0.94 -0.741 
0 -1 


0 1 

1.26 0.2504 
1 .26 -0.2504 
0 .94 -0.7128 
0 -1 


7 


0 1 
0 . 82 0.94 
1.34 0.55667 
1 . 52 0 

1.34 -0. 55667 
0.82 -0.94 
0 -1 


0 1 

0.94 0.87556 
1.44 0.49333 
I 1 . 6 0 
1.44 -0.49333 
0 . 94 -0.87556 
0 -1 


0 1 
0.8 0.8 
1.18 0.43167 
1.3 0 

1.18 -0.43167 
0.8 -0.8 
0 -1 


0 1 
0.76 0 .768 

1.1 0.40667 

1.2 0 

1.1 -0.40667 
0 .76 -0.768 
0 -1 


8 


0 1 

0.7 0. 96429 
1.2 0.68571 
1.46 0.24714 
1.46 -0.24714 
1.2 -0. 68571 
0.7 -0 . 96429 
0 -1 


0 1 

0.76 0.89524 
1.22 0.60286 
1.44 0.21143 
1.44 -0.21143 
1.22 -0.60286 
0.76 -0 . 89524 
0 -1 


0 1 
0.76 0.85 
1.16 0-55286 
1.34 0.19071 
1.34 -0.19071 
1.16 -0.55286 
0.76 -0.85 
0 -1 


0 1 

0.8 0.82857 
1.16 0.528 
1.32 0.18057 
1.32 -0.18057 
1.16 -0.528 
0.8 -0.82857 
0 -1 



FIG. 38 



2351216 



1 

rOMPI ITER CONFERENCING APPARATUS 

The present invention relates to the field of remote conferencing, and more 
particularly, to computer conferencing carried out by animating three- 
5 dimensional computer models (avatars) in dependence upon real-life 

movements of die conference participants. 

A number of systems for carrying out remote conferences, such as video 
conferences, are known. 

10 

In a typical, conventional system, a camera records images of one or more 
conference participants and the image data is sent to the other participants, 
where it is displayed. In conferences involving participants at three or more 
sites, data is displayed to a given one of the participants by displaying the 

15 video images of the participants from the other sites side by side on a display 

screen. This type of system suffers from a number of problems, however. For 
example, the direction of a participant's gaze (that is, where tiie participant is 
looking) and his body gestures cannot be accurately communicated to the other 
participants. More particularly, if a participant turns his head, or points, to the 

20 participant displayed on the right of his screen, then the other participants see 

the user move his head, or point, to the right but do not know how this 
movement relates to the other participants in the conference. Accordingly, it 
is not possible to reproduce eye contact and body gestures, which have been 
shown to be necessary cues for effective communication. 

25 

A common solution to the problem of communicating gaze and gestures is to 
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perform the video conference in a virtual space which is shared by all of the 
participants. Each participant is represented by an avatar (that is, a three- 
dimensional computer model) in the virtual space, and the avatars are ihcn 
animated using motion parameters measured from the motion of the real 

5 participants. In this way, a participant can move around the conference room 

by transmitting the necessary motion parameters to his avatar. Images of the 
avatars in the virtual space are displayed to each participant so that a simulated 
video conference is seen. Methods which have been suggested for image 
display in such a system include displaying the images on a large, life-size 

10 display so that the participants are positioned where they would have been if 

the meeting was real, for example as described in "Virtual space 
teleconferencing: real-time reproduction of 3D human images" by J. Ohya, Y. 
Kitamura, F. Kishino andN. Terashima in Journal of Visual Communication 
and Image Representation, 6(1), pages 1-25, March 1995. In this way. as the 

15 participant to whom the images are displayed moves his head, different parts 

of the screen become visible in the same way that different parts of the 
meeting room would become visible in a real-world conference. This method, 
however, suffers from the problem that the display is extremely large and 
expensive. 



20 



Another method which has been suggested for displaying images of a virtual 
conference is to display them on a conventional small screen display device, 
and to change the view displayed on the device to the participant as the 
direction of the eyes of the participant's avatar change in the virtual space, for 
25 example as described in US 5736982. It has also been suggested to display 

images in this way on a head-mounted display. These systems suffer from the 
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problem, however, that the user can find the displayed images confusing and 
unnatural. In addition, head-mounted displays are expensive and somewhat 
cumbersome. 

A further approach to communicating gaze and gesture information is 
disclosed in "Interfaces for Multiparty Videoconferences" by Buxton et al in 
Video-Mediated Communication (Editors Finn, Sellen & Wilbur), Lawrence 
Erlbaum Associates, 1997, ISBN 0-8058-2288-7, pages 385-400. In this 
system, a virtual conference approach is not adopted, and instead video images 
of each participant are recorded and sent to the other participants. The video 
images for each participant are then displayed on separate display modules 
which are arranged around the viewer's desk in exactly the same positions that 
the participants would occupy in a real video conference. This system suffers 
firom the problems that the nxunber of display modules, and hence the cost, 
increases as the number of participants in the meeting increases, and also the 
process for arranging the modules in the correct positions is difficult and time 
consuming. 

A further approach is disclosed in "Look Who's Talking: The GAZE 
20 Groupware System" by Vertegaal et al, in Summary of ACM CHr98 

Conference on Human Factors in Computing Systems, Los Angeles, April 
1998, pages 293-294. In this system, a shared virtual meeting room is again 
proposed, but instead of avatars, a two-dimensional model of a display screen 
is positioned where each participant would sit Images of the virtual room are 
25 then rendered from a unique, constant viewpoint for each participant An eye 

tracking system is used to measure each participant's eye movements in real 
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life and a camera records snap shots of the user. The 2D display screen in the 
^qrfaal meeting rootn for a participant is then moved according to the 
participant's eye movements by rotating it about one or two axes, and tiie snap 
shot image data is presented on the 2D display screen. This system, too, 
suffers from a number of problems. For example, the images displayed to 
each participant are unrealistic, and it becomes difficult to arrange the 2D 
display screens in the virtual conference room for any more than four 
conference participants. 

The present invention has been made with the above problems in mind. 

The present invention provides a computer conferencing system and method, 
and apparatus for use therein, in which gaze infomiation is communicated by 
providing avatars of the participants in a different three-dimensional model at 
each participant apparatus, and by changing the view direction of each avatar 
using infomiation defining where the corresponding participant is looking in 
real-life. 

In this way, because the three-dimensional model is different at each 
participant apparatus, a participants avatar undergoes different movements at 
each apparatus, and gaze information can be accurately conveyed. 

The present invention also provides a system for conducting a virtual meeting 
comprising a plurality of apparatus for use by participants which are arranged 
to generate and exchange data such that rotations of a participant's head in 
real-life cause rotations of the head of a corresponding avatar which differ at 



different apparatus. 



Embodiments of the invention will now be described, by way of example only, 
with reference to tiie accompanying drawings in which: 

Figure 1 schematically shows a plurality of user stations interconnected to 
carry out a video conference in an embodiment of die invention; 

Figure 2A shows a user station and a user. Figure 2B shows the headset and 
body markers worn by the user, and Figure 2C shows the components of the 
headset worn by the user; 

Figure 3 is a block diagram showing an example of notional functional 
components within the computer processing apparatus at each user station; 

Figure 4 shows the steps performed to carry out a video conference; 

Figure 5 shows the processing operations performed at step S4 in Figure 4; 

Figure 6 shows an example seating plan defined at step S24 in Figure 5; 

Figure 7 shows tiie processing operations perforaied at step S6 in Figure 4; 

Figure 8 shows the processing operations performed at step S62 in Figure 7; 

Figure 9 shows the processing operations performed at step SlOO in Figure 8; 



Figure 10 shows the processing operations performed at step S 130 in Figure 9; 

Figure 11 shows the processing operations performed at step S146 and 
step S150 in Figure 10; 

Figure 12 shows the processing operations performed at step S 132 in Figure 9; 

Figure 13 illustrates the offset angle 6 between the plane of the user's head and 
the plane of his headset calculated at step S64 in Figure 7; 

Figure 14 shows the processing operations performed at step S64 in Figure 7; 

Figure 15 shows the processing operations performed at step S234 in 
Figure 14; 

Figure 16 illustrates the line projection and mid-point calcxilatibn performed 
at step S252 and step S254 in Figure 15; 

Figure 17 shows the processing operations performed at step S66 in Figure 7; 

Figure 18 shows the processing operations performed at step S274 in 
Figure 17; 

Figure 19 shows tiie processing operations performed at step S276 in 
Figure 17; 



7 

Figure 20 shows the processing operations performed at step S3 24 in 
Figure 19; 

Figure 21 illustrates the angle calculation performed at step S346 in Figure 20; 

5 

Figure 22 illustrates the standard coordinate system set up at step S278 in 
Figure 17; 

Figures 23 A, 23B, 23C, 23D and 23E show examples of avatar positions at 
10 conference room tables; 

Figure 24 shows a piece-wise linear function relating horizontal screen 
position to view parameter, which is stored at step S72 in Figure 7; 

15 Figure 25 shows the processing operations performed at step S8 in Figure 4; 

Figure 26 shows the processing operations performed at step S370 in 
Figure 25; 

20 Figures 27 A, 27B and 27C illustrate the calculation at step S394 in Figure 26 

of the point at which the user is looking by projecting a line from the plane of 
the user's head and determining the intersection of the line with the display 

screen; 



25 



Figure 28 shows the processing operations performed in each of steps S374-1 
to S374-6 in Figure 25; 
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Figures 29 A, 29B and 29C illustrate how the position of an avatar's head is 
r^h^rtcwf^A In HRnftfidence iition chan6[es of the corresDondine participant's head 

X 4 W * 

in real-life at step S430 in Figure 28; 

5 Figure 30 shows the processing operations performed at step S3 76 in 

Figure 25; 

Figure 3 1 illustrates examples of markers displayed in images at steps S454 
and S456 in Figure 30; 

10 

Figure 32 shows the processing operations performed at step S378 in 
Figure 25; 

Figure 33 shows the processing operations performed in a modification to 
15 calculate the positions of the avatars in a 3D computer model of the 

conference room such that, when an image of the conference room model is 
displayed on the display screen, the avatars are evenly spaced across a 
horizontal line on the display and the minimum movement which the head of 
an avatar appears to undergo to look from one avatar to another is maximised; 

20 

Figure 34 shows the processing operations performed at step S500 in 
Figure 33; 
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Figures 35 A and 35B schematically illustrate the processing performed at step 
S502 in Figure 33; 



Figures 36 A, 36B and 36C schematically illustrate the processing performed 
at step SS06 in Figure 33; 

Figure 37 shows examples of the results of performing tiie processing at steps 
S500 to S512 in Figure 33; and 

Figure 38 shows an example of a look-up-table and the data stored therein 
which may be stored at a user station to determine the positions of avatars in 
the 3D computer model of the conference room at the user station. 

Referring to Figure 1, in this embodiment, a plurality of user stations 2, 4, 6, 
8, 10, 12, 14 are connected via a commimication path 20, such as the Internet, 
wide area network (WAN), etc. 

As will be described below, each user station 2, 4, 6, 8, 10, 12, 14 comprises 
apparatus to facilitate a desktop video conference between the users at the user 
stations. 

Figures 2A, 2B and 2C show the components of each user station 2, 4, 6, 8, 
10, 12, 14 in this embodiment. 

Referring to Figure 2A, a user station comprises a conventional personal 
computer (PC) 24, two video cameras 26, 28 and a pair of stereo 
headphones 30. 

PC 24 comprises a unit 32 containing, in a conventional manner, one or more 
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processors, memory, and sound card etc, together with a display device 34, 

which in thic f»mhnHiTTiftnt nnmnriRft aVevhnard 36 

«Ji.*^V*«> w m.mfmrm^y . . ... ^..y ~— — ' — — y £ — V - - - 

and mouse 38. 



5 PC 24 is programmed to operate in accordance with programming instructions 

input for example as data stored on a data storage medium, such as disk 40, 
and/or as a signal input to PC 24 for example over a datalink (not shown) such 
as the Internet, and/or entered by a user via keyboard 36. 

10 PC 24 is comiected to the Internet 20 via a connection (not shown) enabling 

it to transmit data to, and receive data from, the other user stations. 

Video cameras 26 and 28 are provided to record video images of user 44 , and, 
in this embodiment, are of conventional charge coupled device (CCD) design. 

15 As will be described below, image data recorded by cameras 26 and 28 is 

processed by PC 24 to generate data dejBning the movements of user 44, and 
this data is then transmitted to the other user stations. Each user station stores 
a three-dimensional computer model of the video conference containing an 
avatar for each participant, and each avatar is animated in response to the data 

20 received from the user station of the corresponding participant. 

In the example shown in Figure 2A, cameras 26 and 28 are positioned on top 
of monitor 34, but can, however, be positioned elsewhere to view user 44. 

25 Referring to Figures 2A and 2B, a plurality of coloured markers 70, 72 are 

provided to be attached to the clothing of user 44. The markers each have a 
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different colour, and, as will be explained later, are used to determine the 
position of the user's torso and arms during the video conference. The 
markers 70 are provided on elasticated bands to be wom around the user's 
wrists, elbows and shoulders. A plurality of markers 70 are provided on each 
elasticated band so that at least one marker will be visible for each position 
and orientation of the user's arms. The markers 72 are provided with a suitable 
adhesive so that they can be removably attached to the torso of user 44, for 
example along a central line, as shown in Figure 2B, such as at the positions 
of buttons on the user's clothes. 

Referring to Figure 2C, headset 30 comprises earphones 48, 50 and a 
microphone 52 provided on a headband 54 in a conventional manner. In 
addition, light emitting diodes (LEDs) 56, 58, 60, 62 and 64 are also provided 
on headband 54. Each of the LEDs 56, 58, 60, 62 and 64 has a different 
colour, and, in use, is continuously illuminated. As will be explained later, the 
LEDs are used to determine the position of the user's head during tiie video 
conference. 

LED 56 is moxmted so that it is central with respect to earphone 48 and 
LED 64 is mounted so that it is central with respect to earphone 50. The 
distance "a" between LED 56 and the inner surface of earphone 48 and 
between LED 64 and the inner surface of earphone 50 is pre-stored in PC 24 
for use in processing to be performed during the video conference, as will be 
described below. LEDs 58 and 62 are slidably mounted on headband 54 so 
that their positions can be individually changed by user 44. LED 60 is 
mounted on a member 66 so that it protrudes above the top of headband 54. 
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In this way, when mounted on the head of user 44, LED 60 is held clear of the 
user's hair. Each of the LEDs 56, 58, 60, 62 and 64 is mounted centrally with 
respect to the width of headband 54, so that the LEDs lie in a plane defined by 
the headband 54. 

Signals from microphone 52 and signals to headphones 48, 50 are carried to 
and from PC 24 via wires in cable 68. Power to LEDs 56, 58, 60, 62 and 64 
is also carried by wires in cable 68. 

Figure 3 schematically shows the functional units into which the components 
of PC 24 effectively become configured when programmed by programming 
instructions. The units and interconnections shown in Figure 3 are notional 
and are shown for illustration purposes only to assist understanding; they do 
not necessarily represent the exact units and connections into which the 
processor, memory, etc of PC 24 become configured. 

Referring to Figure 3, central controller 100 processes inputs from user input 
devices such as keyboard 36 and mouse 38, and also provides control and 
processing for a number of the other functional units. Memory 102 is 
provided for use by central controller 100. 

Image data processor 104 receives frames of image data recorded by video 
cameras 26 and 28. The operation of cameras 26 and 28 is synchronised so 
that images taken by the cameras at the same time can be processed by image 
data processor 104. Image data processor 104 processes synchronous frames 
of image data (one from camera 26 and one from camera 28) to generate data 
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defining (i) image pixel data for the user's face, (ii) the 3D coordinates of each 
of the markers 70 and 72 on the user's arms and torso, and (iii) a view 
parameter which, as will be explained furtiter below, defines the direction in 
which the user is looking. Memory 106 is provided for use by image data 
processor 104, 

The data output by image data processor 104 and the sound from 
microphone 52 is encoded by MPEG 4 encoder 108 and output to the other 
user stations via input/output interface 1 10 as an MPEG 4 bitstream. 

Corresponding MPEG 4 bitstreams are received from each of the other user 
stations and input via input/output interface 110. Each of the bitstreams 
(bitstream 1, bitstream 2 .... bitstream "n") is decoded by MPEG 4 
decoder 112. 

Three-dimensional avatars (computer models) of each of the otfier participants 
in the video conference and a three-dimensional computer model of the 
conference room are stored in avatar and 3D conference model store 114. 

In response to the information in the MPEG 4 bitstreams from the other 
participants, model processor 116 animates the stored avatars so that the 
movements of each avatar mimic the movements of the corresponding 
participant in the video conference. 

Image renderer 118 renders an image of the 3D model of the conference room 
and the avatars, and the resulting pixel data is written to frame buffer 120 and 
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displayed on monitor 34 at a video rate. In this way, images of the avatars and 
3D conference model are displayed to the user, and the images show the 
movement of each avatar corresponding to the movements of the participants 
in real-life. 

5 

Soimd data from the MPEG 4 bitstreams received from the other participants 
is processed by sound generator 122 together with information from image 
data processor 104 defining the current position and orientation of the head of 
user 44, to generate signals which are output to earphones 48 and 50 in order 
10 to generate sound to user 44. In addition, signals from microphone 52 are 

processed by somd generator 22 so that sound from the user's own 
microphone 52 is heard by the user via his headphones 48 and 50. 

Figure 4 shows, at a top level, the processing operations carried out to conduct 
15 a video conference between the participants at user stations 2, 4, 6, 8, 10, 12 

and 14. 

Refeixing to Figure 4, at step S2, suitable communication connections between 
each of the user stations 2, 4, 6, 8, 10, 12, 14 are established in a conventional 
20 manner. 

At step S4, processing operations are performed to set up the video 
conference. These operations are performed by one of the user stations, 
previously designated as the conference coordinator, 

25 

Figure 5 shows the processing operations performed at step S4 to set up the 



conference. 
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Referring to Figure 5, at step S20, the conference coordinator requests tiie 
name of each participant, and stores the replies when they are received. 

At step S22, the conference coordinator requests the avatar of each participant, 
and stores the avatars when they are received. Each avatar comprises a three- 
dimensional computer model of the participant, and may be provided by prior 
laser scaiming of the participant in a conventional manner, or in other 
conventional ways, for example as described in University of Surrey Technical 
Report CVSSP - hilton98a. University of Surrey, Guildford, UK. 

At step S24, the conference coordinator defines a seating plan for the 
participants taking part in the video conference. In this embodiment, this step 
comprises assigning a number to each participant (including the conference 
coordinator) and defining the order of the participants in a circle, for example 
as shown in Figure 6. 

At step S26, the conference room coordinator selects whether a circiilar or 
rectangular conference room table is to be used for the video conference. 

At step S28, tfie conference coordinator sends data via Internet 20 defining 
each of the avatars received at step S22 (including his own), the participant 
numbers and seating plan defined at step S24, the table shape selected at 
step S26, and the participants names received at step S20 (including his own) 
to each of the other participants in the video conference. 
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Referring again to Figure 4, at step S6, processing operations are performed 
to calibrate each »-ser station 2, 4, 6, 8, 10, 12, 14 (including the user station 
of the conference coordinator). 

5 Figure 7 shows the processing operations performed at step S6 to calibrate one 

of the user stations. These processing operations are performed at every user 
station. 

Referring to Figure 7, at step S40, the data transmitted by the conference 
10 coordinator at step S28 (Figure 5) is received and stored. The three- 

dimensional avatar model of each participant is stored in its own local 
reference system in avatar and 3D conference model store 1 14. The other data 
is stored for example in memory 102 for subsequent use. 

15 At step S42, central controller 100 requests user 44 to input information about 

the cameras 26, 28. Central controUer 100 does this by displaying a message 
on monitor 34 requesting the user to input for each camera the focal lengdi of 
the lens in millimetres and tiie size of the imaging charge couple device (CCD) 
within the camera. This may be done by displaying on monitor 34 a Ust of 

20 conventional cameras, for which the desired information is pre-stored in 

memory 102, and from which user 44 can select the camera used, or by the 
user inputting the information directly. At step S44, the camera parameters 
input by the user are stored, for example in memory 102 for future use. 

25 At step S46, central controller 100 displays a message on monitor 34 

requesting user 44 to iiq)ut the width in millimetres of the screen of 



17 

monitor 34, and at step S48, the width which is input by the user is stored, for 
example in memory 102, for future use. 

At step S49, central controller 100 displays a message on monitor 34 
instructing the user to wear the headset 30 and body markers 70, 72, as 
previously described with reference to Figures 2A, 2B and 2C. When die user 
has completed this step, he inputs a signal to central controller 100 using 
keyboard 36. Power is then supplied to headset 30 worn by user 44 so that 
each of the LEDs 56, 58, 60, 62 and 64 are continuously illuminated. 

At step S50, central controller 100 displays a message on monitor 34 
instructing the user to position the movable LEDs 58, 62, on headset 30 so that 
the LEDs align widi die user's eyes. When die user has slid LEDs 58 and 62 
on headband 54 so diat they align with his eyes, he inputs a signal to central 
controller 100 using keyboard 36. 

At step S52, central controller 100 displays a message on monitor 34 
instructing the user to position cameras 26 and 28 so that botii cam«-as have 
a field of view which covers the user's position in front of PC 24. When the 
user has positioned the cameras, he inputs a signal to cmtral controller 100 
using keyboard 36. 

At step S54, central controller 100 displays a message on monitor 34 
instructing the user to move backwards, forwards, and to each side over the 
fiill range of distances diat the user is likely to move during the video 
conference. At step S56, as the user moves, frames of image data are recorded 
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by cameras 26 and 28 and displayed on monitor 34, so that the user can check 
whether he is \'isibie to each camera at all positions. 

At step S58, central controller 100 displays a message on monitor 34 asking 
5 the user whether it is necessary to adjust the positions of the cameras so that 

the user is visible throughout the full range of his likely movements. If the 
user inputs a signal using keyboard 36 indicating that camera adjustment is 
necessary, steps S52 to S58 are repeated until the cameras are correctly 
positioned. On the other hand, if the user inputs a signal indicating that the 
10 cameras are correctly positioned, then processing proceeds to step S60. 

At step S60, central controller 100 processes the data defining the avatar of 
user 44 to determine the user's head ratio, that is, the ratio of the width of the 
user's head (defined by the distance between the user's ears) and the length of 
15 the user's head (defined by the distance between the top of the user's head and 

the top of his neck), and also the width of the user's head in real-life (which 
can be determined since the scale of the avatar is known). The head ratio and 
real-life width are stored, for example in memory 106 for subsequent use by 
the image data processor 104. 

20 

At step S62, central controller 100 and image data processor 104 use the 
frames of image data previously recorded at step S56 (after the cameras 26 and 
28 had been positioned for the fmal time) to determine the camera 
transformation model to be used during the video conference. The camera 
25 transformation model defines the relationship between the image plane (that 

is, the plane of tiie CCD) of camera 26 and the image plane of camera 28 
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which will be used to reconstruct the three-dimensional positions of the 
headset LEDs 56, 58, 60, 62, 64 and the body markers 70, 72 using images of 
these LEDs and markers recorded by the cameras 26 and 28. 

Figure 8 shows the processing operations performed by central controller 100 
and image data processor 104 at step S62 to determine the camera 
transformation model. 

Referring to Figure 8, at step S90, the frames of image data recorded at 
step S56 are processed to identify die pair of synchronous images (that is, the 
image from camera 26 and the image from camera 28 recorded at the same 
time) which show the most left position, the pair which show the most right 
position, the pair which show the most forward position, and the pair which 
show the most backward position to which the user moved. In this 
embodiment, step S90 is performed by displaying flie sequence of images 
recorded by one of die cama:as at step S56, and instructing the us» to input 
a signal, for example via keyboard 36 or mouse 38, when the image for each 
of the extreme positions is displayed. As noted above, these positions 
represent the extents of the user's likely movement during the video 
conference. As well as images for the most forward and backward positions, 
images for the most left position and most right position are identified and 
considered in subsequent processing to determine ihe camera transformation 
model since each of the cameras 26 and 28 is positioned at an angle to the 
user, and so movement of the user to die right or left increases or decreases the 
distance of the user from each of the cameras. 
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At step S92, the image data for each of the foxir pairs of images identified at 
step S90 (that is, the pair of images for the most left position, the pair of 
images for the most right position, the pair of images for the most forward 
position and the pair of images for the most backward position) is processed 

5 to identify the positions of the LEDs 56, 58, 60, 62, 64 and coloured body 

markers 70, 72 which are visible in each image of the pair and to match each 
of the identified points between the images in the pair. In this step, since each 
LED and each body marker has a imique predetemiined colour, the pixel data 
for each image in a synchronised pair is processed to identify those pixels 

10 having one of the predetermined colours by examining the RGB values of the 

pixels. Each group of pixels having one of the predetermined colours is then 
processed using a convolution mask to find the coordinates within the image 
as a whole of the centre of the group of pixels. This is performed in a 
conventional manner, for example as described in "AfEine Analysis of Image 

15 Sequences" by L.S. Shapiro, Cambridge University Press, 1995, ISBN 0-521- 

55063-7, pages 16-23. The matching of points between images is done by 
identifying the point in each image which has the same colour (of course, if a 
marker or LED is visible to only one of the cameras 26 or 28, and hence 
appears in only one image, then no matched pair of points will be identified 

20 for this LED or marker). 

At step S94, the coordinates of the matched points identified at step S92 are 
normalised. Up to this point, the coordinates of the points are defined in terms 
of the number of pixels across and down an image firom the top left hand 
25 comer of the image. At step S94, the camera focal length and image plane size 

previously stored at step S44 are used to convert the coordinates of the points 
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from pixels to a coordinate system in millimetres having an origin at the 
camera optical centre. The millimetre coordinates are related to the pixel 
coordinates as follows: 

X* = /I X (x-Q „..(I) 

= -V X (y-cp ....(2) 

5 

where (x*,y*) are the millimetre coordinates, (x,y) are the pixel coordinates, 
(C»Cy) is the centre of the image (in pixels), which is defmed as half of die 
number of pixels in the horizontal and vertical directions, and "h" and "v" are 
tiie horizontal and vertical distances between adjacent pixels (in mm). 

10 

At step S96, a set is formed of all the matched pairs of points identified at step 
S92. This combined set therefore contains points for all fom^ pairs of images. 
Of course, the number of points in die combined set from each pair of images 
may be different, depending upon which LEDs and body markers are visible 
15 in tibe images. However the large number ofbody markers and USDs ensures 

that at least seven markers or LEDs will be visible in each image, giving a 
of 4 X 7 = 28 pairs of matched points in the combined set. 

At step S98, a measurement matrix, M, is set up as follows for the points in the 
20 combined set created at step S96: 
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M = 



/ / 



(3) 



vsiiere (x,y) are the pixel coordinates of the point in the first image of a pair, 
(x',y') are the pixel coordinates of the corresponding (matched) point in Ae 
second image of the pair, and the numbers 1 to k indicate to which pair of 
10 points the coordinates correspond (there being k pairs of points in total). 

At step S 100, the most accurate camera transformation for the matched points 
in tibe combined set is calculated. By calculating this transformation using the 
combined set of points created at step S96, the transformation is calculated 
15 using points matched in a pair of images representing the user's most left 

position, a pair of images representing the user's most right position, a pair of 
images representing the user's most forward position, and a pair of images 
representing the usct's most backward positioa Accordingly, the calculated 
transformation will be valid over the user's entire workspace. 

20 



Figure 9 shows the processing operations performed at step SlOO to calculate 
the most acciurate camera transformation. 
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Referring to Figure 9, at step S 130, a perspective transformation is calculated, 
tested and stored. 



Figure 10 shows the processing operations performed at step S130. 

5 

Referring to Figure 10, at step S 140, the next seven pairs of matched points in 
the combined set created at step S96 are selected (this being the first seven 
pairs the first time step S 140 is performed). 

10 At step S 142, the selected seven pairs of points and the measurement matrix 

set at step S98 are used to calculate the fimdamental matrix, F, representing 
the geometrical relationship between the cameras, F being a three by three 
matrix satisfying the following equation: 



15 



20 
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(x' 1) F 



( \ 

X 



= 0 —(4) 



where (x,y,l) are the homogeneous pixel coordinates of any of the seven 
selected points in the first image of a pair, and (x',y', 1) are the corresponding 
homogeneous pixel coordinates in the second image of the pair. 

The fimdamental matrix is calculated in a conventional maimer, for example 
using the technique disclosed in "Robust Detection of Degenerate 
Configurations Whilst Estimating die Fundamental Matrix" by P.H.S, Torr, 
A. Zisserman and S. Maybank, Oxford University Technical Report 2090/96. 

It is possible to select more than seven pairs of matched points at step S140 
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and to use these to calculate the fundamental matrix at step S 142. However, 
of noints are used in this embodiment, since this has been shown 
empirically to produce satisfactory results, and also represents the minimum 
number of pairs needed to calculate the parameters of the fundamental matrix, 
5 reducing processing requirements. 

At step S144, the fundamental matrix, F, is converted into a physical 
fundamental matrix, V^y„ using the camera data stored at step S44 (Figure 7). 
This is again performed in a conventional mamier, for example as described 
10 in "Motion and Structure from Two Perspective Views: Algorithms, Error 

Analysis and Error Estimation" by J. Weng, T.S. Huang and N. Ahuja, IEEE 
Transactions on Pattern Analysis and Machine hitelligence, vol. 11, No. 5, 
May 1989, pages 451-476, and as summarised below. 

15 First the essential matrix, E, which satisfies the following equation is 

calculated: 



ix*' y*'/) E 



X* 

y* 



= 0 



..(5) 



20 where (x*, y*, f) are the coordinates of any of the selected seven points in die 

first image in a millimetre coordinate system whose origin is at the centre of 
the image, the z coordinate having being nomialised to correspond to the focal 
length, f; of the camera, and (x*', y*', f) are the corresponding coordinates of 
the matched point in the second image of the pair. The fundamental matrix, 

25 F, is converted into the essential matrix, E, using the following equations: 



25 



Uh 0 cjf^ 
0 1/v -c^/f 
K 0 0 \/fj 

M = A^FA 



..(6) 



(7) 



10 



E = 



....(8) 

where the camera parameters "h", "v", "c^". "c," and "f are as defined 
previously, the symbol T denotes the matrix transpose, and die symbol "tr" 
denotes the matrix trace. 



15 



The calculated essential matrix, E, is tiien converted into a physical essential 
matrix, by finding the closest matrix to E which is decomposable 

directly into a translation vector (of unit length) and rotation matrix (this 
closest matrix being EphyJ. 



20 



Finally, the physical essential matrix is converted into a physical fundamental 
matrix, using the equation: 



where the s>rmbol "-1" denotes the matrix invarse. 



25 



Each of the physical essential matrix, Eph,,, and the physical fundamental 
matrix, Fp^j, is a "physically realisable matrix", that is, it is direcdy 
decomposable into a rotation matrix and translation vector. 
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The physical fundamental matrix, F^, defines a curved surface in a four- 
A:^^^.i^^»\ coo^ mnrftsftnteH hv the coordinates (x, y, x*, y') which are 
known as "concatenated image coordinates". The curved surface is given by 
Equation (4) above, which defines a 3D quadric in the 4D space of 
concatenated image coordinates. 

At step S 146, the calculated physical fundamental matrix is tested against each 
pair of points that were used to calculate the fiindamental matrix at step S 142. 
This is done by calculating an approximation to the 4D EucUdean distance (in 
the concatenated image coordinates) of the 4D point representing each pair of 
points firom the surface representing the physical fimdamental matrix. This 
distance is known as the "Sampson distance", and is calculated in a 
conventional manner, for example as described in "Robust Detection of 
Degenerate Configurations Whilst Estimating the Fundamental Matrix" by 
P.H.S. Torr, A. Zisseiman and S. Maybank, Oxford University Technical 
Report 2090/96. 

Figure 1 1 shows the processing operations performed at step S146 to test the 
physical fundamental matrix. 

Referring to Figure 1 1, at step S 170. a counter is setto zero. At step SI 72. the 
tangent plane of the surface representing the physical fimdamental matrix at 
the four-dimensional point defined by the coordinates of the next pair of points 
in the seven pairs of points (the two coordinates defining each point in the pair 
being used to define a single point in the four-dimensional space of the 
concatenated image coordinates) is calculated. Step 8172 effectively 
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comprises shifting the surface to touch the point defined by the coordinates of 
the pair of points, and calculating the tangent plane at that point. This is 
performed in a conventional manner, for example as described in "Robust 
Detection of Degenerate Configurations Whilst Estimating the Fundamental 
Matrix" by P.H.S. Torr, A. Zisserman and S. Maybank, Oxford University 
Technical Report 2090/96. 

At step S174, the noraial to the tangent plane determined at step S172 is 
calculated, and, at step S176, the distance along the normal from the point in 
the 4D space defmed by the coordinates of the pair of matched points to the 
surface representing the physical fundamental matrix (the " S ampson distance") 
is calculated. 

At step S 178, the calculated distance is compared with a threshold which, in 
this embodiment, is set at 1.0 pixels. If the distance is less than the threshold, 
then the point lies sufficiently close to the surface, and the physical 
fundamental matrix is considered to accurately represent the relative positions 
of the cameras 26 and 28 for flie particular pair of matched points being 
considered. Accordingly, if the distance is less than the threshold, at step 
SI 80, the counter which was initially set to zero at step SI 70 is incremented, 
the points are stored, and the distance calculated at step S176 is stored. 

At step S182, it is determined whether there is another pair of points in the 
seven pairs of points used to calculate the fundamental matrix, and steps S 172 
to S182 are repeated until all such points have been processed as described 
above. 
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Referring again to Figure 10, at step 8148, it is determined whether the 
physical fundanjeTital matrix calculated at step S 144 is sufficiently accurate to 
justify further processing to test it against all of the pairs of matched points in 
the combined set. In this embodiment, step S 148 is performed by determining 
whether the counter value set at step SI 80 (indicating the number of pairs of 
points which have a distance less than the threshold tested at step S178, and 
hence are considered to be consistent with the physical fundamental matrix) 
is equal to 7. That is, it is determined whether the physical fundamental 
matrix is consistent with all of the points used to calculate the fundamental 
matrix from which the physical fundamental matrix was derived. If the 
counter is less than 7, the physical fundamental matrix is not tested further, 
and processing proceeds to step S 152. On the other hand, if the counter value 
is equal to 7, at step S150, the physical fundamental matrix is tested against 
each other pair of matched points. This is performed in the same way as step 
S146 described above, with the following exceptions: (i) at step S170, the 
counter is set to 7 to reflect the seven pairs of points ahready tested at step 
S146 and determined to be consistent with the physical fundamental matrix, 
and (ii) the total error for all points stored at step S 180 (including tiiose stored 
during processing at step S146) is calculated, using tite following equation: 




....(10) 



where Cj is the distance for the "i"th pair of matched points between the 4D 
pointrepresentedby their coordinates and the surface representing the physical 
fundamental matrix calculated at step S176, this value being squared so that 
it is unsigned (thereby ensuring that the side of the surface representing the 
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physical fundamental matrix on which the point lies does not affect the result), 
p is the total number of points stored at step SI 80, and is the distance 
threshold used in the comparison at step SI 78. 

The effect of step S150 is to determine whether the physical fundamental 
matrix calculated at step SI 44 is accurate for each pair of matched points in 
the combined set, with the value of the counter at the end (step SI 80) 
indicating the total number of the points for which the calculated matrix is 
sufficiently accurate. 

At step S 152, it is determined whether the physical fundamental matrix tested 
at step S150 is more accurate than any previously calcxilated using the 
perspective calciilation technique. This is done by comparing the counter 
value stored at step S 180 in Figure 11 for the last-calculated physical 
fundamental matrix (this value representing the mmiber of points for which the 
physical fundamental matrix is an accurate camera solution) with the 
corresponding counter value stored for the most accurate physical fundamental 
matrix previously calculated. The matrix with tiie highest number of points 
(counter value) is taken to be the most accurate. If the number of poiats is the 
same for two matrices, the total error for each matrix (calculated as described 
above) is compared, and the most accurate matrix is taken to be the one witii 
the lowest error. If it is determined at step S 152 that the physical fundamental 
matrix is more accurate than the currently stored one, then, at step S154 flie 
previous one is discarded, and the new one is stored together with the number 
of points (counter value) stored at step SI 80 in Figure 11, the points 
themselves, and the total error calculated for the matrix. 
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At step S 156, it is determined whether there is another pair of matched points 
which has not yet been considered, such that there is another unique set of 
seven pairs of matched points in the combined set to be processed. Steps S 140 
to S 156 are repeated until each unique set of seven pairs of matched points has 
5 been processed in the manner described above. 

Referring again to Figure 9, at step S132, an afifme relationship for the 
matched points in the combined set is calculated, tested and stored 



10 Figure 12 shows the processing operations performed at step S132. 



Referring to Figure 12, at step S200, the next four pairs of matched points are 
selected for processing (this being the first four pairs the first time step S200 
is performed). 

15 

When performing the perspective calculations (step S130 in Figure 9), it is 
possible to calculate all of the components of the fundamental matrix, F. 
However, when the relationship between the cameras is an afifine relationship, 
it is possible to calculate only four independent components of the 
20 fundamental matrix, these four independent components defining what is 

commonly known as an "afSne" fimdamental matrix. 



Accordingly, at step S202, the foiu* pairs of points selected at step S200 and 
the measurement matrix set at step S96 are used to calculate four independent 
25 components of the fimdamental matrix (giving the "affine" fimdamental 

matrix) using a technique such as that described in " AfBne Aoalysis of Image 
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Sequences" by L.S. Shapiro, Section 5, Cambridge University Press 1995, 
ISBN 0-521-55063-7. It is possible to select more than four pairs of points at 
step S200 and to use these to calculate the affine fundamental matrix at 
step S202. However, in the present embodiment, only four pairs are selected 
since this has been shown empirically to produce satisfactory results, and also 
represents the minimum number required to calculate the components of the 
affine fundamental matrix, reducing processing requirements. 

At step S204, the affine fundamental matrix is tested against each pair of 
matched points in the combined set using a technique such as that described 
in "Affme Analysis of Image Sequences" by L.S. Shapiro, Section 5, 
Cambridge University Press, 1995, ISBN 0-521-55063-7. The affme 
fundamental matrix represents a flat surface (hyperplane) in four-dimensional, 
concatenated image space, and this test comprises determining the distance 
between a point in the four-dimensional space defined by the coordinates of 
a pair of- matched points and the flat surface representing the affine 
fundamental matrix. As with the tests performed during the perspective 
calculations at step S 146 and S 150 (Figure 1 0), the test performed at step S204 
generates a value for the number of pairs of points for which the affine 
fundamental matrix represents a sufQciently accurate solution to the camera 
transformation and a total error value for these points. 

At step S206, it is determined whether the afOne fundamental matrix 
calculated at step S202 and tested at step S204 is more accurate than any 
previously calculated. This is done by comparing the munber of points for 
which the matrix represents an accurate solution with the number of points for 
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the most accurate afiine fundamental matrix previously calculated. The matrix 
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is the same, the matrix with the lowest error is the most accurate. If the afifine 
fundamental matrix is more accurate than any previously calculated, then at 
5 step S208, it is stored together with the points for which it represents a 

sufficiently accurate solution, the total number of these points and the matrix 
total error. 



At step S210, it is determined whether there is another pair of matched points 
10 to be considered, such that there exists another unique set of four pairs of 

matched points in the combined set to be processed. Steps S200 to S210 are 
repeated until each unique set of four pairs of matched points are processed in 
the manner described above. 



15 Referring again to Figure 9, at step S 134, the most accurate transformation is 

selected from the perspective transformation calculated at step S130 and the 
afFme transformation calculated at step S132. This step is perforaied by 
comparing the number of points which are consistent with the most accurate 
perspective transformation (stored at step S154) with the number of points 

20 which are consistent with the most accurate afSne transformation (stored at 

step S208), and selecting the transformation which has the highest nimiber of 
consistent points (or the transformation having the lowest matrix total error if 
the nimiber of consistent points is the same for both transformations). 



25 



Referring again to Figure 8, at step S 104, it is determined whether the affine 
transformation is the most accurate camera transformation. 
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If it is determined at step SI 04 that the affine transformation is not the most 
accurate transformation, then, at step S106, the perspective transformation 
which was determined at step SlOO is selected for use during the video 
conference. Subsequently, at step S108, the physical fundamental matrix for 

5 the perspective transformation is converted to a camera rotation matrix and 

translation vector. This conversion is performed in a conventional manner, for 
example as described in the above-referenced "Motion and Stracture from 
Two Perspective Views: Algorithms, Error Analysis and Error Estimation" by 
J. Weng, T.S. Huang and N. Ahuja, IEEE Transactions on Pattern Analysis 

10 and Machine Intelligence, Vol. 1 1, No. 5, May 1989, pages 45 1-476. 

In the processing described above with respect to Figure 10, a fundamental 
matrix is calculated (steps S142) and converted to a physical fundamental 
matrix (step S144) for testing against the matched points (steps S146 and 

15 S 150). This has the advantage that, although additional processing is required 

to convert the fundamental matnx to a physical fundamental matrix, the 
physical fundamental matrix ultimately converted at step S 108 has itself been 
tested. If the fundamental matrix was tested, this would then have to be 
converted to a physical fundamental matrix which would not, itself, have been 

20 tested. 

On the other hand, if it is determined at step S104, that the afiOne 
transformation is the most accurate transformation, then, at step S 1 10, the 
affine transformation is selected for use during the video conference. 

25 

At step SI 12, the aflOne fundamental matrix is converted into three physical 
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variables describing the camera transformation, namely the magnification, 
"m", of the object between images recorde-d by the cameras, the axis. cb. of 
rotation of the camera, and the cyclotorsion rotation, 6, of the camera. The 
conversion of the affine fundamental matrix into these physical variables is 
performed in a conventional manner, for example as described in "AfiBne 
Analysis of Image Sequences" by L.S. Shapiro, Cambridge University Press, 
1995, ISBN 0-521-55063-7, Section 7. 

Referring again to Figure 7, at step S64, the position of the headset LEDs 56, 
58, 60, 62 and 64 relative to the head of user 44 is detemiined. This step is 
performed since this relative position will depend on how the user has placed 
the headset 30 on his head. More particularly, as illustrated m Figure 13, tihie 
plane 130 in which the headset LEDs lie is determined by the angle at which 
the user wears the headset 30. Accordingly, the plane 130 of the headset 
LEDs may be different to the actual plane 132 of the user's head. At step S64, 
therefore, processing is carried out to determine the angle 0 between flbie 
plane 130 of the headset LEDs and the actual plane 132 of the user's head- 
Figure 14 shows the processing operations performed at step S64. 

Referring to Figure 14, at step S230, central controller 100 displays a message 
on monitor 34 instructing the user 44 to look directly at the camera to his right 
(that is, camera 28 in this embodiment). 

At step S232, a frame of image data is recorded with both camera 26 and 
camera 28 while the user is looking directly at camera 28. 
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At step S234, the synchronous frames of image data recorded at step S232 are 
processed to calculate the 3D positions of the headset LEDs 56, 58. 60, 62 and 
64, 

Figure 15 shows the processing operations performed at step S3 24 to calculate 
the 3D positions of the headset LEDs. 

Referring to Figure 15, at step S250, the position of each headset LED 56, 58, 
60, 62 and 64 is identified in each of the images recorded at step S232. The 
identification of the LED positions at step S250 is carried out in the same way 
as previously described with respect to step S92 (Figure 8). 

At step S252, the positions of the next pair of LEDs matched between flie pair 
of images are considered (this being the first pair the first time step S252 is 
performed), and the camera transformation model previously determined at 
step S62 (Figure 7) is used to calculate the projection of a ray from the 
position of the LED in the first image through the optical centre of the camera 
for liie first image, and firom the position of the matched LED in the second 
image through the optical centre of the camera for the second image. This is 
illustrated in Figure 16. Referring to Figure 16, ray 140 is projected from the 
position of an LED (such as LED 56) in the image 142 recorded by camera 26 
through the optical centre of camera 26 (not shown), and ray 144 is projected 
from the position of the same LED in image 146 recorded by camera 28, 
through the optical centre of camera 28 (not shown). 



Referring again to Figure 15, at step S254, the mid-point 148 (Figure 16) of 
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the line segment which connects, and is perpendicular to, both of the rays 

r^r^r,i,^r^*c■Ji in Step 5?252 is calculated. The position of this niid-point represents 

I — J — - 

the physical position of the LED in three dimensions. 

5 At step S256, it is determined whether there is another one of the LEDs 56, 58, 

60, 62 or 64 to be processed. Steps S252 to S256 are repeated until the three- 
dimensional coordinates of each of the LEDs has been calculated as described 
above. 

10 Referring again to Figure 14, at step S236, the plane 130 (Figure 13) in which 

the three-dimensional positions of the headset LEDs lie is determined, and the 
angle 0 between this plane and the imaging plane of the camera at which the 
user was looking when the frames of image data were recorded at step S232 
is calculated. Since the user was looking directly at the camera to his right 

15 whrai tile frames of image data were recorded at step S232, tiie direction of tiie 

imaging plane of the camera to the user's right corresponds to the direction of 
the plane 132 of the user's head (Figure 13). Accordingly, tiie angle calculated 
at step S236 is the angle 0 between tiie plane 130 of the headset LEDs and the 
plane 132 of the user's head. 

20 

Referring again to Figure 7, at step S66, the position of the display screen of 
monitor 34 is determined and a coordinate system is defined relative to this 
position. 

25 Figure 17 shows the processing operations performed at step S66. 
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Referring to Figure 17, at step S270, central controller 100 displays a message 
on monitor 34 instructing the user to sit centrally and parallel to the display 
screen of the monitor 34, and to sit upright with his torso touching the edge of 
the desk on which PC 24 stands. At step S272, a further message is displayed 
instructing the user to turn but not otherwise change the position of, his head, 
so that the processing in the steps which follow can be carried out on the basis 
of a constant head position but changing head angle. 

At step S274, the direction of the plane of the display screen of monitor 34 is 
determined. In this embodiment, this is done by determining the direction of 
a plane parallel to the display screen. 

Figure 18 shows the processing operations performed at step S274. 

Referring to Figure 18, at step S300, central controller 100 displays a marker 
in the centre of the display screen of monitor 34, and instracts the user to look 
directly at the displayed marker. 

At step S3 02, a frame of image data is recorded with both camera 26 and 28 
as the user looks at the displayed marker in the centre of the screen of 
monitor 34. 

At step S304, flie du^ee-dimensional positions of the coloured markers 72 on 
the user's torso are determined. This step is carried out in the same way as 
step S234 in Figure 14, which was described above with respect to Figures 15 
and 16, the only difference being that, since the positions of the coloured 
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markers 72 in each image are determined (rather than the positions of the 
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each of the synchronised images. Accordingly, these steps will not be 
described again here. 

At step S306, the three-dimensional positions of the user's headset LEDs are 
calculated. This step is also carried out in the same way as step S234 in 
Figure 14, described above with respect to Figures 15 and 16. 

At step S308, the plane in which the three-dimensional positions of the headset 
LEDs (determined at step S306) lie is calculated 

At step S3 10, the direction of the plane determined at step S308 is adjusted by 
the angle 0 determined at step S64 (Figure 7) between the plane of the headset 
LEDs and the plane of the user's head. The resulting direction is the direction 
of a plane parallel to the plane of the display screen, since the plane of the 
user's head will be parallel to the display screen when the user is looking 
directly at the marker in the centre of the screen. 

Referring again to Figure 17, at step S276, the position in three dimensions of 
the plane of the display screen of monitor 34 is determined. 

Figure 19 shows the processing operations performed at step S276. 

Referring to Figure 19, at step S320, central controller 100 displays a marker 
in the centre of the right edge of the display screen of monitor 34, and displays 
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a message instructing the user to look at the marker. 

At step S322, a frame of image data is recorded with both camera 26 and 28 
as the user looks at the marker displayed at the edge of the display screen. 

5 

At step S3 24, the angle of the user's head relative to the display screen about 
a vertical axis is determined. 

Figure 20 shows the processing operations performed at step S324* 

10 

Referring to Figure 20, at step S340, the three-dimensional positions of the 
headset LEDs are calculated. This step is carried out in the same manner as 
step S234 in Figure 14, and described above with respect to Figures 15 and 16. 
Accordingly, the processing operations will not be described again here. 

15 

At step S342, the plane which passes through the three-dimensional positions 
of the headset LEDs is determined, and, at step S344, the position of this plane 
is adjusted by the headset offset angle 6 (calculated at step S64 in Figure 7) 
to give the plane of the user's head. 

20 

At step S346, the angle between Ae direction of the plane of the user's head 
deteraiined at step S344 and the direction of the plane parallel to the display 
screen detemiined at step S274 (Figure 17) is calculated. This calculated angle 
is the angle of the user's head relative to the plane of the display screen about 
25 a vertical axis, and is illustrated in Figure 21 as angle "a". 
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Referring again to Figure 19, at step S326, the three-dimensional position of 
the display screen is calculated and stored for subsequent use. In this step, the 
width of the display screen previously input by the user at step S46 and stored 
at step S48 (Figure 7) is used together with the angle determined at step S3 24 
of the user's head when looking at a point at the edge of the display screen to 
calculate the 3D position of the display screen. More particularly, referring 
to Figure 21, the distance "d" of the plane parallel to the display screen 
determined at step S274 (Figure 17) is calculated using the angle a and one 
half of the width "W" of the display screen, thereby determining the three- 
dimensional position of the plane of the display screen. The extents of the 
display screen in the horizontal direction are then determined using the width 

Referring again to Figure 17, at step S278, a three-dimensional coordinate 
system and scale is defined relative to the three-dimensional position of the 
display screen. This coordinate system will be used to delSne the tiiree- 
dimensional position of points which are transmitted to the other participants 
during the video conference. Accordingly, each participant uses the same 
coordinate system and scale, and therefore transmits coordinates which can be 
interpreted by the otiiier participants. Referring to Figure 22, in this 
embodiment, the coordinate system is defined with the origin at the centre of 
the display screen, the "x" and "y " axes lying in the plane of the display screen 
in horizontal and vertical directions respectively, and the "z" axis lying in a 
direction perpendicular to the plane of the display screen in a direction 
towards the user. The scale for each axis is predefined (or could, for example, 
be transmitted to each user station by the conference coordinator). 
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Also at step S278, the transformation is calculated which maps three- 
dimensional coordinates calculated using the camera transformation model 
determined at step S62 to the new, standardised coordinate system and scale. 
This transformation is calculated in a conventional manner, with scale changes 

5 being determined by using the width of the user*s head in real-life (determined 

at step S60 in Figure 7) and the distance "a" between each of LEDs 56 and 64 
and the inner surface of the earphones 48, 50 (Figure 2C) to determine the 
distance between the LEDs 56 and 64 in real-life when the headset 30 is worn 
by the user, and by using this real-life LED separation to relate the distance 

10 between the three-dimensional coordinates of the headset LEDs 56 and 64 

calculated using the camera transformation model at step S306 in Figure 18 to 
the predefined scale of the standard coordinate system. 

At step S280, the three-dimensional positions of the body markers 72 
15 previously calculated at step S304 (Figure 18) are transformed into the 

standard coordinate system defined at step S278. 

At step S282, the three-dimensional positions of the body markers 72 in the 
standard coordinate system are transmitted to the other participants in the 
20 video conference, for subsequent use in positioning the user's avatar in the 

three-dimensional computer model of the conference room, as will be 
described below. 
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Referring again to Figure 7, at step S68, a three-dimensional computer model 
is set up of the conference room table to be used for the video conference. In 
this embodiment, three-dimensional computer models are pre-stored of a 
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rectangular and semi-circular conference room table, and the appropriate 
mcdel is selected for use in dependence upon tfee instructions receiyed from 
the conference room coordinator at step S40 defining the shape of the 
conference room table to be used. 

5 

In addition, name labels showing the name of each of the participants are 
placed on the conference room table in the three-dimensional computer model, 
with the name displayed on each label being taken from the names of the 
participants received from the conference coordinator at step S40. In order to 

1 0 determine the positions for the name labels on the conference table, the seating 

position of each participant is first deteixnined using the seating plan received 
from the conference coordinator at step S40. Although the conference 
coordinator defined the seating plan by defining the order of the participants 
in a circle (step S24 in Figure 5, and Figure 6), at step S68 the positions of the 

1 5 avatars around the conference room table are set so that, when an image of the 

avatars and conference room table is displayed to the user, the avatars are 
spread apart across the width of the display screen of monitor 34. In this way, 
each avatar occupies its own part of the display screen in the horizontal 
direction and all of the avatars can be seen by the user. 

20 

Figures 23 A, 23B, 23C, 23D and 23E illustrate how the positions of avatars 
are set in this embodiment for different numbers of participants in the video 
conference. 
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Referring to Figures 23A, 23B, 23C, 23D and 23E in general, the avatars are 
spaced apart evenly around a semi-circle 164 in three dimensions. The 
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diameter of the semi-circle 164 (which is the same irrespective of the number 
of participants in the video conference) and the viewing position from which 
images are rendered for display to tiie user are chosen so that each avatar 
occupies a unique position across the display screen and the outermost 
avatars are close to the edges of the display screen in the horizontal direction. 
In this embodiment, the avatars are positioned around semi-circle 164 and a 
viewing position is defined such that the positions at which the avatars appear 
in an image are shown in the table below. 



NUMBER OF 

AVATARS 

DISPLAYED 


POSITION OF AVATAR IN IMAGE 

(W = screen widdi) U 


2 


±0.46W 1 


3 


O.OOW; ±0.46W 1 


4 


±0.20W; ±0.46W | 


5 


O.OOW; ±0.20W; ±0.4dW 1 


6 


±0. 12W; ±0.34W; ±0.46W | 



Table 1 



Referring to Figure 23A, when there are three participants in the video 
conference, the avatars 1 60 and 162 for the two participants other than the user 
at the ixser station being described are positioned behind the same, straight 
edge of a conference room table at the ends of the semi-circle 164. As set out 
in the table above, avatar 160 is positioned so that it appears in an image at a 
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distance -0.46W from the centre of the display screen in a horizontal 
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from the centre. Name plates 166 and 168 showing tihe respective names of 
the participants are placed on the conference room table in front of the avatars 
facing the viewing position from which images of the conference room table 
and avatars will be rendered. In this way, the user, when viewing the display, 
can read the name of each participant. 

Figure 23B shows an example in which there are four participants of the video 
conference and a rectangular conference room table has been selected by the 
conference organiser. Again, the avatars 170, 172 and 174 for the three 
participants other than the user at the user station are arranged around the 
semi-circle 164 with equal spacing. Avatar 170 is positioned so that it appears 
in an.image at a distance -0.46W from the centre of the display screen in a 
horizontal direction, avatar 172 is positioned so that it appears at the centre of 
the display screen (in a horizontal direction), and avatar 174 is positioned so 
that it appears at a distance +0.46W from the centre. A name label 176, 178, 
180 is placed on the conference room table facing the viewing position from 
which images of the conference room table and avatars will be rendered. 

Figure 23 C shows an example in which there are four participants of the video 
conference, as in the example of Figure 23B, but the conference coordinator 
has selected a circular conference room table. In this case, the edge of the 
model of the conference room table follows the semi-circle 164. 

Figure 23D shows an example in which there are seven participants in the 
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video conference, and a rectangular conference room table is specified by the 
conference coordinator. The avatars 190, 192, 194, 196, 198, 200 for each of 
the participants other than the user at die user station are equally spaced 
around semi-circle 164, such that, when an image is rendered, the avatars 
occupy positions of -0.46W, -0.34W, ~0.12W, +0.12W, -f0.34W and 
+0.46W respectively from the centre of the display screen in a horizontal 
direction. A name label 202, 204, 206, 208, 210, 212 is provided for each 
participant facing the viewing position from which images will be rendered so 
that the participants' names are visible in the image displayed on monitor 34 
to the user. 

The relative positions and orientations of the avatars around the conference 
room table will be different for the participant at each user station. Referring 
to the seating plan shown in Figure 6, and assuming that the user at the user 
station being described is participant 1, then participant 2 is to the left of the 
user and participant 7 is to the right of the user. Accordingly, as shown in 
Figure 23D, the position of avatar 190 for participant 2 is set so that it appears 
on the left of the image, and the position of avatar 200 for participant 7 is set 
so that it appears on the right of the image. The positions of avatars 192, 194, 
196 and 198 for participants 3, 4, 5 and 6 respectively are arranged between 
tiie positions of avatars 190 and 200 in accordance with the order defined in 
the seating plan. 

Similarly, by way of fiirther example, the positions of the avatars would be set 
at the user station of participant 2 so that the order of the participants from left 
to right in an image is 3, 4, 5, 6, 7 and L 
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except that a circular conference room table is specified by the conference 
coordinator. 

Referring again to Figure 7, at step S70, a respective transformation is defined 
for each participant which maps the avatar for the participant firom the local 
coordinate system in which it was stored at step S40 into the three-dimensional 
computer model of the conference room created at step S68 so that the avatar 
appears at the correct position at the conference room table. In this step, the 
three-dimensional positions of the body markers 72 previously received from 
each participant (as transmitted at step S282 in Figure 17) when tiie participant 
was sitting with his torso against the edge of his desk are used to determine the 
transformation such that the edge of the user's desk maps to the edge of the 
conference room table where the avatar is placed. 



At step S72, data is stored, for example in memory 106, defining the 
relationship between each of the avatars which vsdll be displayed to the user 
(that is, the avatars of the other participants) and the horizontal position on the 
display screen of monitor 34 at which the avatar will be displayed. As 
described above with respect to step S68, the avatars are positioned in the 
conference room model such that the position at which each avatar Avill appear 
across the display screen in a horizontal direction when an image is rendered 
is fixed. Accordingly, in this embodiment, data defining these fixed positions 
for each different number of participants is pre-stored in memory 106, and, at 
step S72, the data defiming the fixed positions for the correct number of 
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participants is selected and each of the fixed positions is assigned a participant 
number (received from the conference coordinator at step S40) defining the 
participant displayed at that position. More particularly, as will now be 
described with reference to Figure 24, data defining a piece-wise linear 
function between the fixed positions of the avatars is stored and the participant 
numbers are associated with this data at step S72. 

Referring to Figure 24, data for the display of six avatars is shown 
(corresponding to the examples described previously with respect to 
Figure 23D and Figure 23E). The vertical axis in Figure 24 shows horizontal 
screen position, and values on this axis range from -0.5 (corresponding to a 
position on the left hand edge of the screen) to +0.5 (corresponding to a 
position on the right hand edge of the screen). The horizontal axis has six 
equally spaced divisions 400, 402, 404, 406, 408 and 410, each of which 
corresponds to a participant. Accordingly, the value of the function at each of 
these positions on the horizontal axis is -0.46, - 0.34, -0. 12, +0. 12, +0.34 and 
+0.46 respectively (as shown by the dots in Figure 24) since these are the 
horizontal screen positions at which the avatars for six participants will be 
displayed. Data is also stored defining a piece-wise linear fimction between 
each of these values. At step S72, each of the six positions on the horizontal 
axis is assigned a participant number corresponding to tiie participant \yiiose 
a^tar will be displayed at the associated horizontal screen positioiL Referring 
to the seating plane shown in Figure 6, in this example, position 400 is 
allocated participant number 2, position 402 is allocated participant number 3, 
position 404 is allocated participant number 4, position 406 is allocated 
participant number 5, position 408 is allocated participant number 6 and 
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position 410 is allocated participant number 7. It should be noted ttiat the 
««imK«>T-c fnr ftflc.h nf fheftft nnsitions will he diEFerent for each user 
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station. By way of example, at the user station for participant 2, the 
participant numbers allocated to positions 400, 402, 404, 406, 408 and 410 
will be 3, 4, 5, 6, 7 and 1 respectively. 

As a result of allocating the participantnumbers, the piece-wise linear function 
therefore defines, for each horizontal screen position a so-called "view 
parameter" V for the user which defines which participant in the conference 
room the user is looking at when he is looking at a particular position on flie 
display screen of monitor 34. As will be e7q)lained below, during the video 
conference, processing is carried out to determine the horizontal position on 
the display screen which the user is looking, and this is used to read the "view 
parameter" V for the user, which is then transmitted to the other participants 
to control the usct's avatar. 

Referring again to Figure 7, at step S74, when all of the preceding steps in 
Figure 7 have been completed, a "ready" signal is transmitted to the 
conference coordinator indicating that the user station has been calibrated and 
is now ready to start die video conference. 

Referring again to Figure 4, at step S8, the video conference itself is carried 
out. 

Figure 25 shows the processing operations which are performed to carry out 
the video conference. 
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Referring to Figure 25, the processes at steps S370, S372, S374-1 to S374-6, 
S376 and S378 are carried out simultaneously. 

At step S3 70, jframes of image data are recorded by cameras 26 and 28 as the 
user participates in the video conference, that is as the user views the images 
of the avatars of the other participants on monitor 34, listens to the sound data 
from tiie other participants and speaks into microphone 52. Synchronous 
frames of image data (that is, one frame from each camera which were 
recorded at the same time) are processed by image data processor 104 at video 
fi'ame rate to generate in real time data defining the three-dimensional 
coordinates of the body markers 70, 72, the view parameter V defining where 
the user was looking in the conference room when the images were recorded, 
and pixel data for the face of the user. This data is then transmitted to all of 
the other participants. Step S3 70 is repeated for subsequent pairs of frames of 
image data until the video conference ends. 

Figure 26 shows the processing operations performed at step S370 for a given 
pair of synchronised frames of image data. 

Referring to Figure 26, at step S390, synchronous frames of image data are 
processed to calculate the three-dimensional coordinates of the headset 
LEDs 56, 58, 60, 62, 64 and body markers 70, 72 which are visible in both of 
the images. This step is carried out in the same way as step S234 in Figure 14, 
and described above with respect to Figures 15 and 16, except that the 
processing is performed for tiie body markers 70, 72 in addition to the headset 
LEDs. Accordingly, this processing will not be described again here. 
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At step S392, the plane of the user's head is determined by finding the plane 
...ui^u — *urnitah thf. tVireA-ffimensinnal nositions of the headset LEDs 
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calculated at step S390 and adjusting this plane by the headset offset angle 6 
previously determined at step S64 (Figure 7). 

5 

At step S394, a line is projected from the plane of the user's head in a direction 
perpendicular to this plane, and the intersection of the projected line with the 
display screen of monitor 34 is calculated. This is illustrated in Figures 27 A, 
27B and 27C. 

10 

Referring to Figure 27A, in this embodiment, the mid-point 220 of the line 
between the three-dimensional coordinates of the headset LEDs 58 and 62 is 
determined and a line 218 is projected from the calculated mid-point 220 
perpendicular to the plane 224 of the user's head (which was calculated at 

15 step S392 by determining the plane 228 of the headset LEDs and adjusting this 

by the headset oflfeet angle 0). As described above with respect to step S50 
(Figure 7), the headset LEDs 58 and 62 are aligned with the user's eyes so that, 
in this embodiment, the projected line 218 is not only perpendicular to the 
plane 224 of the user's head, but also passes through a point on this plane 

20 representative of the position of the user's eyes. 

Referring to Figure 27B, the projected line 218 intersects tiie plane of the 
display screen of monitor 34 at a point 240. In step S394, the horizontal 
distance "h" shown in Figure 27C of the point 240 from the centre of the 
25 display SCTeeaa (that is, tiie distance between tfie vertical line in the plane of the 

display screen on which point 240 lies and the vertical line in the plane of the 
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display screen on which the centre point of the display lies) is calculated using 
the three-dimensional coordinates of the display screen previously determined 
at step S66 (Figure 7) during calibration. 

Referring again to Figure 26, at step S396, the view parameter V defming 
where the user was looking when the frames of image data being processed 
were recorded is determined. More particularly, the ratio of the distance "h" 
calculated at step S394 to the width "W" of the display screen stored at 
step S48 (Figure 7) is calculated and the resulting value is used to read a value 
for the view parameter V from the data stored at step S72 during calibration. 
By way of example, if the distance "h" is calculated to be 2.76 inches and the 
width "W" of Ihe display screen is 12 inches (corresponding to a 15 inch 
monitor), then a ratio of 0.23 would be calculated and, referring to Figure 24, 
this would cause a view parameter "V" of 5.5 to be generated. As can be seen 
from the example shown in Figures 27B and 27C, the projected ray 218 
indicates tihat the user 44 is looking between participants 5 and 6, and hence 
a view parameter of 5.5 would define this position. 

Referring again to Figure 26, at step S398, the direction of the imaging plane 
of each of the cameras 26 and 28 (that is, the plane in which the CCD of the 
camera lies) is compared with the direction of the plane of the user's head 
calculated at step S392 to detennine which camera has an imaging plane most 
parallel to the plane of the user's head. Referring again to Figure 27B, for the 
example illustrated, it will be seen that the imaging plane 250 for camera 28 
is more parallel to the plane 224 of the user's head than the imaging plane 252 
of camera 26. Accordingly, in the example illustrated in Figure 27B, camera 
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28 would be selected at step S398. 



At step S400, the frame of image data from the camera selected at step S398 
is processed to extract the pixel data representing the user's face in the image. 
5 In this embodiment, this step is performed using the three-dimensional 

positions of the headset LEDs 56 and 64 calculated at step S390, the size and 
ratio of die user's head determined at step S60 (Figure 7) and the distance "a" 
between each LED 56, 64 and the inner surface of the corresponding 
earpiece 48, 50 (which, as noted above, is pre-stored in PC 24). More 

10 particularly, using the three-dimensional positions of the headset LEDs 56 and 

64, and the distance "a", the points representing the extents of the width of the 
user's head in three dimensions are determined. These extent points are then 
projected back into the image plane of the camera selected at step S398 using 
the camera transformation determined at step S62 (Figure 7). The projected 

15 points represent the extents of the width of the user^s head in the image, and, 

using the value of this width and the ratio of the user's head lengdi, the extents 
of the user's head length in the image are determined. Pixels representing the 
image between the extents of the width of the user's head and the extents of the 
length of the user's head are then extracted. In this way, image data is not 

20 extracted which shows the headset 30 which the user is wearing. 



At step S401, the three-dimensional coordinates of the body markers 70, 72 
calculated at step S3 90 are transformed into die standardised coordinate 
system previously defined at step S66 in Figure 7. 

25 

At step S402, MPEG 4 encoder 108 encodes the face pixel data extracted at 
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step S400, the 3D coordinates of the body markers 70, 72 generated at 
step S401 and the view parameter determined at step S396 in accordance with 
the MPEG 4 standard. More particularly, the face pixel data and the 3D 
coordinates are encoded as a Movie Texture and Body Animation Parameter 
5 (BAP) set and, since the MPEG 4 standard does not directly provide for the 

encoding of a view parameter, this is encoded in a general user data field. The 
encoded MPEG 4 data is then transmitted to the user stations of each of the 
other participants via input/output inter&ce 1 10 and the Internet 20. 

10 Referring again to Figure 25, at step S372, sound produced by user 44 is 

recorded witii microphone 52 and encoded by MPEG 4 aicoder 108 in 
accordance with the MPEG 4 standard. The encoded sound is flien transmitted 
to the other participants by input/output interface 1 10 and tihe Internet 20. 

15 At steps S374-1 to S374-6, MPEG decoder 112, model processor 116 and 

central controller 100 perform processing to change the a\^tar models stored 
in avatar and 3D conference model store 1 14 in dependence upon the MPEG 4 
encoded data received from the other participants. More particularly, in 
step S374-1 processing is performed to change the avatar of the first external 

20 participant using the data received from that participant, in step S3 74-2 the 

avatar of the second external participant is changed using data received from 
the second external participant etc. Steps S374-1 to S374-6 are performed 
simultaneously, in parallel. 

25 Figure 28 shows flie processing operations performed in each of steps S3 74- 1 

to S374-6. 
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Referring to Figure 28, at step S420, MPEG 4 decoder 1 12 awaits further data 

is decoded by the MPEG 4 decoder, and the decoded data is tiien passed to 
model processor 116 at step S422, where it is read to control subsequent 
5 processing by model processor 1 16 and central controller 100. 

At step S424, the position of the avatar body and arms are changed in tiie 
three-dimensional coordinate system in which it is stored in avatar and 3D 
conference model store 1 14 so that the body and arms of the avatar fit the 
10 received three-dimensional coordmates of the body markers 70, 72 of the 

actual participant. In this way, the pose of the avatar is made to correspond 
to the real-life pose of the actual participant which the avatar represents. 

At step S426, the face pixel data in the bitstrqam received firom the participant 
15 is texture mapped onto the face of the avatar model in three dimensions. 

At step S428, tibie avatar is transformed firom the local coordinate system in 
which it is stored into the three-dimensional model of the conference room 
using the transformation previously defined at step S70 (Figure 7). 

20 

At step S430, the head of the transformed avatar in the three-dimensional 
conference room model is changed in dependence upon the view parameter, 
V, of the participant defined in the received bitstream. More particularly, the 
head of the avatar is moved in three dimensions so that the avatar is looking 
25 at the position defined by the view parameter. For example, if the view 

parameter, V, is 5, then the avatar's head is moved so that the a^tar is lookiiig 
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at the position in the three-dimensional conference room at which participant 
5 is seated. Similarly, if, for example, the view parameter is 5.5, then the 
avatar's head is rotated so that the avatar is looking mid-way between the 
positions at which the fifth and sixth participants sit in the three-dimensional 
conference room. 

Figures 29A, 29B and 29C illustrate how the position of the avatar's head is 
changed in the conference room model in dependence upon changes of the 
participants head in real-life. 

Referring to Figure 29A, an example is shown in which participant 1 in real- 
life is initially looking at participant 2 (or more particularly, the avatar of 
participant 2) on the display screen of his monitor, and then rotates his head 
through an angle p 1 to look at participant 7 on the display screen. In real-life, 
the angle of rotation pi would be approximately 20*^-30° for typical screen 
sizes and seating positions from the screen. 

Figure 29B represents the images seen by participant 3 of the video 
conference. When the head of participant 1 in real-life is looking at 
participant 2, then the head of the avatar 300 of participant 1 is positioned so 
that it, too, is looking at the avatar of participant 2 in the three-dimensional 
model of tiie conference room stored at the user station of participant 3. As 
the first participant rotates his head in real-life to look at participant 7, the 
head of the avatar 300 undergoes a corresponding rotation to look at the avatar 
of participant 7 in the three-dimensional conference room model. However, 
the angle P2 through which the head of avatar 300 moves is not the same as 
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angle P 1 through which the head of the first participant moves in real-life. In 

relative positions of the avatars in the conference room model. Consequently, 
the motion of the heads of the avatars does not take place in the same 
coordinate system as that of the motion of the heads of the actual participants 
in real-life. 



The change in angle of the head of avatar 300 will be different for each user 
station since the arrangement of the avatars in the three-dimensional 
conference room model is different at each user station. Figure 29C illustrates 
how the head of avatar 300 moves in the image displayed at the user station 
of participant 2 as participant 1 moves his head in real-life through the angle 
Pl to look from participant 2 to participant 7. Referring to Figure 29C, since 
participant 1 is originally looking at participant 2, the head of avatar 300 is 
originally directed towards the viewing position from which the image is 
rendered for display to participant 2. As participant 1 rotates his head tiirough 
angle P 1 in real-life, the head of avatar 300 is rotated through angle p3 so that 
the head is looking at the avatar of participant 7 in the three-dimensional 
model of the video conference room stored at the user station of participant 2. 
The angle P3 is different to both pi and P2. 



Referring again to Figure 25, at step S376, image renderer 118 and central 
controller 100 generate and display a frame of image data on monitor 34 
showing the current status of the three-dimensional conference room model 
and die avatars therein. The processing performed at step S376 is repeated to 
display images at video rate, showing changes as the avatars are updated in 
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response to changes of the participants in real-life. 

Figure 30 shows the processing operations performed at step S376. 

Referring to Figure 30, at step S450, an image of the three-dimensional 
conference room model is rendered in a conventional manner to generate pixel 
data, which is stored in frame buffer 120. 

At step S452, the current view parameter V determined at step S370 in Figure 
25 (which occurs in parallel) is read. As noted above, this view parameter 
defines the position on the monitor at which the user is determined to be 
looking, relative to the avatars displayed. 

At step S454, the image data generated and stored at step S450 is amended 
with data for a marker to show the position at which the user is determined to 
be looking in accordance with the view parameter read at step S452. 

At step S456, the pixel data now stored in frame buffer 120 is output to 
monitor 34 to display an image on the display screen. 

Figure 31 illustrates the display of markers in accordance witii the users 
current view parameter V. 

Referring to Figure 31, if for example it is determined at step S452 that the 
user's current view parameter is 5, then at step S454, image data for arrow 3 10 
is added so that, when the image is displayed at step S456, the user sees 
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arrow 3 10 indicating that he is determined to be looking at participant 5 and 

participants. Accordingly, if the displayed marker does not accurately indicate 
the user's intended viewing direction, the user can change the position of his 
head whilst watching the position of the marker change until the correct 
viewing direction is determined and transmitted to tiie other users. 

By way of further example, if the user's view parameter is 6.5, then arrow 320 
would be displayed (instead of arrow 310) indicating a position mid-way 
between the avatars of participants 6 and 7. 

Referring again to Figure 25, at step S378, MPEG 4 decoder 1 12, central 
controller 100 and sound generator 122 perform processing to generate sound 
for the user's headset 30. 

Figure 32 shows the processing operations performed at step S378. 

Referring to Figure 32, at step S468 the input MPEG 4 bitstreams received 
from each participant are decoded by MPEG 4 decoder 1 12 to give a sound 
stream for each participant 

At step S470, the current head position and orientation for each avatar in the 
coordinate system of the three-dimensional computer model of the conference 
room are read, thereby determining a soimd direction for the sound for each 
of the avatars. 
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At step S472, the current head position and orientation of the user (to whom 
the sound will be output) is read (this having being akeady determined at step 
S370 in Figure 25), thereby defining the direction for which the output sound 
is to be generated. 

At step S474, the input sound streams decoded at step S468, the direction of 
each sound stream determined at step S470 and the output direction for which 
sound is to be generated determined at step S472 are input to the sound 
generator 122, where processing is carried out to generate left and right output 
signals for the user's headset 30. In this embodiment, the processing in sound 
generator 122 is performed in a conventional manner, for example such as that 
described in "The Science of Virtual Reality and Virtual Environments" by 
R.S. Kalawsky, Addison-WesleyPubUshing Company, ISBN 0-201-63 171-7, 
pages 184-187. 

In the processing described above, at step S472, the user's current head 
position and orientation are used to determine an output direction which is 
subsequently used in the processing of flie sound streams at step S474. In this 
way, tihie sound which is output to the headset 30 of the user changes in 
dependence upon the user's head position and orientation, even though the 
images which are displayed to die user on monitor 34 do not change as his 
head position and orientation change (other than the displayed marker 
indicating where the user is looking). 

A number of modifications are possible to the embodiment of the invention 
described above. 
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For example, in the embodiment described above, the cameras 26 and 28 at 
each user station record images of a single user at the user station and 
processing is performed to determine transmission data for the single user. 
However, the cameras 26 and 28 may be used to record images of more than 
5 one user at each user station and processing may be earned out to generate the 

face pixel data, the three-dimensional coordinates of the body markers and the 
view parameter for each of the users at the user station, and to transmit this 
data to the other participants to faciUtate the animation of an avatar 
corresponding to each one of the users. 

10 

In the embodiment above at steps S42 and S44 (Figure 7), camera parameters 
are input by the user. However, each of the cameras 26, 28 may be arranged 
to store these parameters and to pass it to PC 32 when the camera is connected 
to the PC. 

15 

In the embodiment above, LEDs 56, 58, 60, 62 and 64 are provided on headset 
30. However, other forms of lights or identifiable markers may be provided 
instead. 

20 In the embodiment described above, the headset LEDs 56, 58, 60, 62, 64 are 

continuously illuminated and have different colours to enable them to be 
identified in an image. Instead of having different colours, the LEDs could be 
arranged to flash at different rates to enable them to be distinguished by 
comparison of images over a plurality of frames, or the LEDs may have 

25 different colours and be arranged to flash at different rates. 



61 

In the embodiment above, the coloured body markers 70, 72 may be replaced 
by LEDs. Also, instead of using coloured markers or LEDs, the position of the 
user's body may be determined using sensors manufactured by Polhemus Inc., 
Vermont, USA, or other such sensors. 

In the embodiment above, in the processing performed at step S3 70 
(Figure 25) data for the whole of each image is processed at step S390 (Figure 
26) to determine the position of each LED and each coloured body marker in 
the image. However, the position of each LED and each body marker may be 
tracked through successive frame of image data using conventional tracking 
techniques, such as Kalman filtering techniques, for example as described in 
" AflHne Analysis of Image Sequences" by L.S. Shapiro, Cambridge University 
Press, 1995, ISBN 0-521-55063-7, pages 24-34. 

In the embodiment above, at step S72 (Figure 7), data is stored defining the 
relationship between horizontal screen position and the view parameter V. 
Further, at step S396 (Figure 26), this stored data is used to calculate the view 
parameter to be transmitted to Ae other participants in dependence upon the 
horizontal distance between the point on the display screen at which the user 
is looking and the centre of the display screen. This method of detennining the 
view parameter V is accurate when tiiie viewing position from which the 3D 
model of the conference room and avatars is rendered is such that the 
participants are displayed to the user with their heads at substantially the same 
vertical height on the screen. However, errors can occur when the viewing 
position is such that the heads of the participants are at different heights on the 
display screen. To address this, it is possible to store data at step S72 defining 
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the relationship between the view parameter V and the distance of each avatar 
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point on arc 164 which is nearest to the point on the screen at which the user 
is looking and use the calculated point on arc 164 to read the view parameter 
V which is to be transmitted to the other participants from the stored data. 
Further, although in the embodiment above the viewing position from which 
the 3D conference room model and avatars are rendered is fixed, it is possible 
to allow the user to vary this position. The view parameter V would then be 
calculated most accurately using the positions of the avatars around arc 164 as 
described above. 

In the embodiment above, in the processing performed at step S370 
(Figure 25), the user's view parameter is determined in dependence upon the 
orientation of the user's , head. In addition, or instead, the orientation of the 
user's eyes may be used. 

In the embodiment above, the sound from the user's own microphone 52 is fed 
to the user's headphones 48, 50. However, the user may be able to hear his 
own voice even when wearing the headphones, in which case such processing 
is unnecessary. 

In the processing performed at step S62 (Figure 7) in the embodiment above, 
both a perspective camera transformation and an afiSne transformation are 
calculated and tested (steps S130 and S132 in Figure 9). However, it is 
possible to calculate and test just an afiSne transformation and, if the test 
reveals acceptable errors, to use the a£Bne transformation during the video 
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conference, or, if the test reveals unacceptable errors, to calculate and use a 
perspective transformation. 



In tiie embodiment above, the names of the participants displayed on the name 
5 plates are based on the information provided by each participant to the 

conference coordinator at step S20 (Figure 5). However, the names may 
alternatively be based on other information, such as the log-on information of 
each participant at a user station, the telephone number of each user station, 
or information provided in the data defining the avatar of each participant. 

10 

In the embodiment above, at step S400 (Figure 26), the face pixel data is 
. extracted following processing to determine the extents of the user*s head such 
that the extracted pixel data will not contain pixels showing the headset 30. 
Instead, the pixel data may be extracted from an image by simply extracting 
15 all data bounded by tiie positions of the LEDs 56, 60 and 64 and using tihe 

user's head ratio to determine the data to extract in the direction of tfie length 
of the user's face. Conventional image data interpolation techniques could 
then be used to amend the pixel data to remove the headset 30. 

20 In the embodiment above, a view parameter V is calculated to define the 

position of the head of an avatar. In this way, movements of the user's head 
in real-life are appropriately scaled to give the correct movement of the 
avatar*s head in the three-dimensional conference room models at the user 
stations of the other participants. In addition, it is also possible to perform 

25 corresponding processing for user gestures, such as when the xiser points, nods 

his head, etc. at a particular participant (avatar) on his display screen. 
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In die embodiment above, processing is performed by a computer using 
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all, of the processing could be performed using hardware. 

In the embodiment above, two cameras 26 and 28 are used at each user station 
to record frames of image data of the user 44. The use of two cameras enables 
three-dimensional position information to be obtained for the headset LEDs 
and body markers. However, instead, a single camera could be used together 
with a range finder to provide depth information. Further, a single calibrated 
camera coxild be used on its own, with deptii information obtained using a 
standard technique, for example as desicribed in "Computer and Robot vision. 
Volume 2" by R.M. Haralick and L.G. Shapiro, Addison-Wesley Publishing 
Company, 1993, ISBN 0-201-56943-4, pages 85-91. 

Instead of using LEDs or coloured markers to determine the position of the 
user's head, arms and torso, conventional feature matching techniques could 
be used to match natural features of the user in each of the images in a pair of 
synchronised images. Examples of conventional techniques are given in "Fast 
visual tracking by temporal consensus" by A.H. Gee and R. CipoUa in Image 
and Vision Computing, 14(2): 105-114, 1996, in which nostrils and eyes are 
tracked and "Learning and Recognising Human Dynamics in Video 
Sequences" by C. Bregler, Proceedings IEEE Conference on Computer Vision 
and Pattern Recognition, June 1997, pages 568-574, in which blobs of motion 
and colour similarity corresponding to arms, legs and torso are tracked. 

In the embodiment above, as well as including an avatar of each of the other 
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participants in the 3D computer model of the conference room stored at each 
user station, a three-dimensional computer model of one or more objects, such 
as a wdiiteboard, flip chart etc. may also be stored. The position of such an 
object may be defined in the seatiag plan defined by die conference 
coordinator at step S24 (Figure 5). Similarly, data defining a three- 
dimensional computer model of one or more characters (a person, animal etc.) 
which is to be animated during die video conference but whose movements are 
not related to the movements of one of the participants at the conference may 
also be stored in the three-dimensional computer model of the conference 
room at each user station. For example, the movements of such a character 
may be computer-controlled or controlled by a user. 

In the embodiment above, at step S68 (Figure 7), the positions of the avatars 
around the conference room table are set using the values given in Table 1. 
However, other positions may be used. For example, the avatars may be 
arranged so that their horizontal positions on the display screen are given by 
the following equation: 

W„ = 0.46Frcos^-^j ....(U) 

where: N is the mmiber of avatars displayed on the screen 
W„ is die position of die nth avatar (n = 1 ...N) 
i=n-l 

W is the screm width 



Alternatively, rather than arranging the avatars at equally spaced positions 
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around a semi-circle as in the embodiment above or arranging the avatars so 
that their posittons on the display screen are givcu by equation (11) above, 
processing apparatus 32 may perform processing to calculate a position for 
each avatar in the 3D computer model of the conference room such that, 
5 firstly, the minimimi movement that the head of an avatar appears to undergo 

to switch gaze fi-om one participant to another participant in the conference is 
maximised and, secondly, the avatars appear evenly spaced across a horizontal 
line on the display screen of monitor 34. 

Accordingly, by maximising the minimum apparent head movement, it is 
easier for the user viewing the display to detect when an avatar has changed 
its gaze direction and at which of the other avatars it is now looking. 
Similarly, by arranging the avatars in the 3D computer model of the 
conference room so that they appear evenly spaced across the display screen, 
occlusion between the avatars and large amounts of imused display space are 
avoided. 

Figure 33 shows processing operations that can be performed by processing 
apparatus 32 to calculate positions for the avatars in flie 3D computer model 
of the conference room such that die minimum apparent head movement for 
the avatars is maximised and the avatars appear to the viewer to be evenly 
spaced across the display screen. 



10 



15 



Referring to Figure 33, at step S500, values of parameters to be used in the 
25 processing are set. 
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Figure 34 shows the processing operations performed by processing apparatus 
32 at step S500. 

Referring to Figure 34, at step S520, the value of the number of avatars to be 
displayed to the user on monitor 34 is read (this being the value received from 
the conference coordinator at step S40 in Figure 7). 

At step S522, the average distance of tiie user from the display screen of 
monitor 34 is calculated. More particularly, the average distance is calculated 
as the average of the user's foremost position and rearmost position determined 
from the image data recorded at step S56 (Figure 7) using the position of the 
display screen determined at step S66 (Figure 7). 

At step S524, the half-width (that is, W/2 in Figure 2 1) is calculated based on 
the full screen width stored at step S48 (Figure 7). 

At step S526, the average distance calculated at step S522 is converted to a 
multiple of the half-screen width calculated at step S524, thereby giving the 
average distance of the user from the display screen in terais of the screen 
half-width. 

At step S528, a value defining the minimiun size that an avatar can have when 
displayed on the display screen is read. More particularly, as in the example 
shown in Figures 23B to 23 E, one or more of the avatars will be positioned in 
tiie 3D conference room model at a position further away from the viewing 
position from which an image of the model is rendered (tiie "rendering 
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viewpoint") tiian some of the other avatars. Accordingly, the avatar(s) 
-J A-r««ri tVio Ty»n^>>nno viftwnmnt wlllhiav^ smaller size on the 



display screen than the avatars which are closer to the rendering viewpoint. 
The value read at step S528 therefore defines the smallest size which an avatar 

5 is allowed to take on the display screen. In this embodiment, the size is 

defined as a relative value, that is, the size is defined as a fi-action of the size 
of the largest avatars which appear on the display screen. In this embodiment, 
the minimum value is pre-stored as 0.3 (so that the smallest avatar can not be 
less than 30% of the size of the largest avatar), although this value could be 

10 determined and input by a user. 

At step S530, the minimum display size read at step S528 is used to calculate 
the maximum distance along the z-axis at which an avatar may be placed in the 
3D computer model of the conference room. That is, a z-value is calculated 
15 beyond which avatars can not be placed in the 3D conference room model 

because their size would become too small on the display screen. More 
particularly, in this embodiment, flie maximum z^value, is calculated 
using the following equation: 



20 Z^ = f' 



1-1 



mm 



..(12) 



where: k is the distance of the user fi-om the display screen as a multiple 
of the SCTeen half-width (calculated at step S526); 



25 



is the minimum display size (read at step S528) which can 
take values 0 < ^ 1. 
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At step S532, a value defining the maximum display size which the smallest 
avatar in the display can take is read. As explained above with reference to 
step S528, one or more the avatars will be positioned in the 3D computer 
model of the conference room further away from the rendering viewpoint than 
other avatars. The value read at step S532 defines the maximiun size which 
the avatar(s) furthest from the rendering viewpoint can take. In this 
embodiment, the maximum display value is defined as a relative value, that 
is a fraction of the size of the avatars which have the maximum size in the 
display (that is, the avatars closest to the rendering viewpoint). As with the 
minimum display value read at step S528, the maximum display value is pre- 
stored, but could be input by the user. In this embodiment, the value of the 
maximum display size is set to 1.0 so that, if necessary, all of the avatars can 
take the same size in the display. 

At step S534, the maximum display size value read at step S532 is used to 
calculate the minimum distance along the z-axis at which an avatar can be 
placed in the 3D computer model of the conference room in order for the size 
of the avatar on the display not to exceed the maximum display value read at 
step S532. More particularly, in this embodiment, the niinimum z-value, 
is calculated as follows: 



'nun 



= k 



S. 



-1 



...(13) 



max 



where: 



k is as defined above for equation (12); 



is the maximum display size (read at step S532) which can 
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take values 0 < ^ L 

At step S536, a value defining the z-axis resolution to be used in calculating 
avatar configurations is read. As will be explained further below, this 
resolution value defines a step size along the z-axis to be used for calculating 
different positions of the avatars in the 3D computer conference room model. 
In this embodiment, the z-axis resolution is pre-stored, and has the value 0. 1. 
However, the z-axis resolution could be input by the user of PC 24. 

Referring again to Figure 33, at step S502, all possible configurations of the 
avatars in the conference room 3D model are calculated subject to constraints 
determined by the maximum z value calculated at step S530, the nunimimi z 
value calculated at step S534 and the z-axis resolution read at step S536. 

The processing performed at step S502 will be explained referring to Figures 
35 A and 35B by way of example, which schematically show a horizontal 
cross-section through the display screen 500 of display device 34, the real- 
world containing the user viewing the display device (labelled PI), and the 3D 
computer model of the conference room, for cases where 5 and 6 avatars are 
to be displayed respectively (the positions of avatars in the 3D computer 
model being labelled P2, P3, P4, P5 and P6), 

In this embodiment, the positions of the two outermost avatars (defined by the 
seating plan received from the conference coordinator at step S40 in Figure 7) 
are fixed at the edges of the display screen 500 irrespective of the number of 
avatars to be displayed, and therefore have (x, z) coordinates (0, 1) and (0, -1) 



r 
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since the position (0, 0) is defined to be at the centre of the display screen 500 
and coordinate values are defined as multiples of the screen half-width. 

At step S502, based on the number of avatars to be displayed on the display 
screen 500, horizontal rays (5 10, 520 and 530 in Figure 35 A, and 540 and 550 
in Figure 35B) are projected from the position of the user viewing the display 
(defined by the average distance, "k", calculated at step S526 in Figure 34) 
which divide the display screen 500 into equal size portions in the horizontal 
plane (the portions being of size 1/2 unit in the example of Figure 35 A and 1/3 
unit in the example of Figure 35B). The points at which these rays intersect 
the display screen 500 define the position on the display screen 500 at which 
avatars between die two extreme most avatars will appear to the user on the 
display. Accordingily, the avatars between the two extreme most avatars must 
be positioned in the 3D computer model of the conference room on the rays 
510, 520, 530 or 540, 550 projected from the position of the viewer. At step 
S502, each possible configuration of the avatars between &e two extreme most 
avatars is calculated. 

More particularly, referring to Figure 35 A first, the avatar which will have the 
20 smallest size on the display screen 500 is defined to be the avatar labelled P4, 

since, because this avatar is the central avatar in the seating plan, it will be the 
furthest avatar from the rendering viewpoint which is defined to be the 
position of the viewer, PL Accordingly, P4 must lie along ray 520 between 
the minimum z value and the maximum z value calculated at steps S530 and 
25 S534 (Figure 34). The avatars P3 and P5 each have the same z value because 

these avatars are placed in the 3D computer model symmetrically. However, 



10 
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the z-value of P3 and P5 can not exceed the z-value of P4 (since P4 is defined 
to be the avatar which has the smallest si^e in the display 500)., A.t step S502j 
each possible configuration of the avatars P3, P4 and P5 is calculated subject 
to these constraints and the constraint that the minimum distance in the z 
direction between avatar positions is given by the z-axis resolution read at step 
S536 (0.01 in this embodiment). 

In the example shown in Figure 35B, because the number of avatars to be 
displayed on display screen 500 is an even number, bofli of the avatars P3 and 
P4 will have the same size on display screen 500 and will be smallest avatars 
displayed. Accordingly, at step S502, each z-position of the points P3 and P4 
is considered between the maximum z value and minimum z value calculated 
at steps S530 and S534 in increments of 0.0 1 (the z-axis resolution read at step 
S536). 

Similar processing is carried out at step S502 for numbers of avatars other than 
the five avatars to be displayed in Figure 35A and the four avatars to be 
displayed in Figure 35B using the same principles described above. 

At step S504, the next configuration from the configurations calculated at step 
S502 is selected for processing (this being the first configuration the first time 
step S504 is performed). 

At step S506, a value is calculated representing the smallest amount of 
movement that the head of any of flie avatars in the configuration selected at 
step S504 will appear to undergo when the heads of the avatars in the 3D 



73 

computer model are rotated to change gaze from one avatar to another. 

The processing performed at step S506 will be described refeiring.to Figures 
36A, 36B and 36C by way of example, which schematically illustrate a 
5 configuration for five avatars P2, P3, P4, P5 and P6 which are to be displayed 

to the user PI viewing the display. 

Refening to Figure 36A, at step S506, the angle 600 through which the head 
of avatar P2 must turn to look from avatar P3 to P4 (or vice versa), the angle 

10 610 through which the head of avatar P2 must turn to look from avatar P4 to 

P5, the angle 620 through which the head of avatar P2 must turn to look from 
avatar P5 to P6, and tiie an^e 630 through which the head of avatar P2 must 
turn to look from avatar P6 to the use- PI is calculated. Further, in this 
embodiment, each of the calculated angles is scaled by multiplying the angle 

15 by a scale factor, S„ given by: 




...(14) 



where: k is as defined above for equation ( 12); 

20 

z is the z-value of the avatar for which the head turn angle has 
been calculated. 

By multiplying a head turn angle by the scale factor given by equation (14), 
25 the angle is converted from an angle in tiie 3D computer model through which 

the head of the avatar will turn to a value representing the movement through 
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which the user viewing the display screen 500 will see the head of the avatar 



turn. 



Referring to Figure 36B, the angle 640 through which the head of avatar P3 
must turn to look from avatar P4 to P5, the angle 650 through which the head 
of avatar P3 must turn to look from avatar P5 to P6, the angle 660 through 
which the head of avatar P3 must turn to look from avatar P6 to user PI, and 
the angle 670 through which the head of avatar P3 must turn to look from user 
PI to avatar P2 are calculated and scaled by multiplying them by the scale 
factor Se given by equation (14) above. 

Likewise, referring to Figure 36C, the angle 680 through which the head of 
avatar P4 must turn to look from avatar P5 to P6, the angle 690 through which 
the head of avatar P4 must turn to look from avatar P6 to user PI, the angle 
700 through which the head of avatar P4 must turn to look from user PI to 
avatar P2, and the angle 710 through which the head of avatar P4 must turn to 
look from avatar P2 to avatar P3 are calculated and scaled by the scale factor 
given by equation (14) above. 

The head turn angles for avatars P5 and P6 do not need to be calculated at step 
S506 because these angles are the same as the head tum angles for avatars P3 
and P2, respectively (due to the symmetrical positioning of the avatars in the 
3D computer model). Similarly, in the case of the example shown in 
Figure 35B, head tum angles would be calculated at step S506 for avatars P2 
and P3, but not avatars P4 and P5 since these would be the same as the head 
turn angles for avatars P3 and P2, respectively. 
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Also at step S506, the value of the movement which is the smallest value of 
those calculated is determined. 

At step S508, a test is carried out to determine whether the smallest movement 
value identified at step S506 is larger than the currendy stored movement 
value. 

If it is determined at step S508 that the smallest movement value determined 
at step S506 is larger than the currendy stored movement value (which, by 
default, will be the case when step S508 is performed for the first tune), then, 
at step S5 1 0, the ciurendy stored movement value is replaced with the smallest 
movement value determined at step S506 and the avatar configuration selected 
at step S504 is stored. 

On the other hand, if it is deterauned at step S508 that the smallest movement 
value determined at step SS06 is not larger than the currendy stored movement 
value, then step S510 is omitted so diat the currently stored movement value 
and currently store avatar configurations are retained. 

At step SS12, it is determined whether another of the avatar configurations 
calculated at step S502 remains to be processed. Steps S504 to S512 are 
repeated until each avatar configuration calculated at step S502 has been 
processed in the manner described above. 

As a result of performing the processing at steps S500 to S512, the positions 
in the 3D computer model of the conference room have been calculated for the 
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avatars to be displayed which ensure that when an image of the 3D computer 
Tr»r\/j^i tc rendered from Ihe average distance position of the \'iewer (PI in 
Figures 35A and 35B), the avatars will appear to the viewer (if his head is 
actually in that average position) to be equally spaced across a horizontal line 
on the display screen 500, and the minimmn movement through which the 
head of an avatar will be seen to tum to look from one avatar to another or 
from one avatar to the user is maximised. 

A routine for performing the processing described above at steps S500 to S5 12 
is given in Appendix A, in which: 

steps S500 to S5 12 overall are performed by part A; 

step S500 is performed by parts B and C; 

step S502 is performed by part D, part El, part F, part J and part K; 

step S506 is performed by parts E2 and E3, part H, part I and part L; 
and 

steps S508 and S5 10 are performed by part E4. 

Figure 37 shows examples of results of performing steps S500 to S512 for a 
distance of the viewer from the viewing screen 500 corresponding to three 
screen half*widths (which has been found in practice to be a typical distance 
for the viewer when viewing a conventional PC monitor 34). 
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Referring to Figure 37, the position of the viewer is labelled PI, the positions 
in flie 3D computer model of the conference room of the two outermost avatars 
to be displayed to the user (irrespective of the total number of avatars to be 
displayed) are shown by the solid circles 800 and 810, while the position in 
the 3D model of the remaining avatar when three avatars are to be displayed 
is shown by diamond 820, the positions of the remaining two avatars when 
four avatars to be displayed are shown by squares 830 and 840, the positions 
of the remaining three avatars in the 3D model when five avatars are to be 
displayed are shown by triangles 850, 860 and 870, and the positions of the 
remaining six avatars in the 3D model when eight avatars are to be displayed 
are shown by circles 880, 890, 900, 910, 920 and 930. All coordinate values 
shown in Figure 37 are expressed as multiples of the screen half-width. 

Referring again to Figure 33, at step S514, the currently stored avatar 
configuration is selected as the avatar configuration to be used for the video 
conference. 

When performing processing as set out above to calculate the positions of the 
avatars in the 3D conference room model, step S26 in Figure 5 (at which the 
conference coordinator selects the shape of the conference room table) is 
unnecessary. In addition, when performing step S68 in Figure 7, rather than 
selecting a model of the conference room table in accordance with the 
instructions from the conference coordinator and then determining the 
positions of the avatars around the table, the positions of the avatars are 
determined as described above and then a 3D conference room table model is 
defined to fit between them. 
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In the processing described above, the average distance of the viewer from the 
son is r^ilciilRted at sten S522 and S526 TFieure 34\ and the 
calculated average distance is used to compute positions in the 3D computer 
model of the conference room for the avatars to be displayed to the user. In 

5 this way, the computed configuration remains fixed diroughout the video 

conference. However, instead, the actual distance of the user from the display 
screen 500 may be monitored during the video conference, and the positions 
of the avatars in the 3D computer model changed as the distance of the user 
from the display screen changes. In this case, rather than perform the 

10 processing described above to re-calculate the positions of the avatars in the 

3D computer model of die conference room for each new distance of the user 
from the display screen, die calculations may be performed beforehand and 
stored in a look-up-table which defines the positions of the avatars in the 3D 
computer model of the conference room for different distances of the user 

15 from the display screen 500. The look-up-table may also define these 

positions for a different number of avatars to be displayed, thereby enabling 
it to be used for different video conferences. 



Figure 38 shows an example of a look-up-table 1000, in which the positions 
20 of the avatars in a 3D conference room model are defined for distances of the 

user from the display screen 500 corresponding to twice the screen half-width, 
three times the screen half-width, four times the screen hatf-width, and five 
times the screen half-width (although in practice avatar positions would be 
defined for many other values of the distance of die user from the display 
25 screen 500). In addition, the positions are defined for three, four, five, six, 

seven and eight avatars to be displayed. 
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During the video conference, the actual position of the user from the display 
screen 500 may be calculated and used as an input to the look-up-table to read 
the positions in the 3D computer model of the conference room for the 
distance of the user in the look-up-table which is closest to the calculated 
5 actual distance of the user. 

A look-up-table may also be stored and used to detemiine the positions of the 
avatars in the 3D conference room model even when the positions are to 
remain fixed throughout the video conference. For example, the average 
10 distance of the user from the display screen may be used as an input to the 

look-up-table to read the positions in the 3D computer model of the conference 
room for the distance of the user in the look-up-table which is closest to the 
input average distance of the user. 
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Appendix A 



A 


FindOptimumLayout 


Al 


B SetUpQ //This reads in input values 


A2 


R = 0 // Initialise the recursion counter 


A3 


D DoRecursionO 


A4 


E ComputeConfigCx*) 


AS 


END 










BO 


SetUpO 


Bl 


Number of Virtual Participants = V (input) // Number of displayed avatars in screen 


B2 


N=V+1 


B3 


Degrees of freedom D = floor( ( V> 1 ) /2 ) 


B4 


Number of non symmetrically redundant VPs U = ceil{ V/2 ) 


B5 


MinScale = 0.3 (input) // The minimum scale acceptable relative to 

// the peripheral avatars 


B6 


MaxScale =1.0 (input) // The maximum scale acceptable relative to 

// the peripheral avatars 


B7 


Step = 0 . 01 ^( input ) // The search step size or resolution 


B8 


parallax = TRUE (input) // A flag to indicate that the turn should be 

// scaled by the depth (x-coordinate) of the 
// avatar, to generate apparent turn or parallax 


B9 


K = distance between the viewer and the display (as a multiple of screen half-width) (input) 


BIO 


Xmax = C XfromScale(MinScale) 


Bll 


Xmin = C XfromScale(MaxScaIe) 


B12 


OptCritTum = - 1 // Set to an arbitrary small number. 


B13 


x is a D-vector (a vector with D elements, one for each degree of freedom) x[O..D-l] 










C 


XfromScale(s) - converts scale to depth,x. 


CI 


Return X = K*a/s-l) 










D 


DoRecursionO 


Dl 


If(R<D) 




f 


D2 


R++ 


D3 


If(R=l) 


D4 


For (x[0]=xMin; x[0]<xMax; x[0]+=Step) // For x[0] start search from 
D DoRecursionO; // Xmin 




Else 


D5 


For (x[R.l]=x[R-2]; x[R-l]<xMax; x[R-l]+=Step) // For xti] i>U 
D DoRecursionO; // start search from x[i-. 1 ] 


D6 


R.. 








else 




f 


D7 


E ComputConfig(x) 








RETURN 






E 


ComputeConfig(x) 


El 


F Compute VPPositions(x) // Compute the positions of the Virtual Participants or avatars 


E2 


H ComputeTum Angles(x) // Now compute the angles or parallax 


E3 


I HndCriticalTurnO; // Find the smallest (critical) turn/parallax for this configuration 


E4 


If (critTum>optCritTum) x* = x, optCritTurn = critTum // x* is the optimum 

// configuration 
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E5 


RETURN 










F 


ComputeVPPositions(x) 


Fl 


p[0].SetX(-m k); p[0].SetY(0.0); // ...The position of the Real Participant 


F2 


p[ll.SetX(0.0); p[U,SelY(l.O); // ...The upper-most avatar 


F3 


for (u = 2; u<=U; u++) 




I 


F4 


xu = x(u-21 // Upper avatar 


F5 


yu = J Y(u,xu); 


F6 


p[u].SetX(xu). pfuLSetYCyu) 


F7 


If(u!=V-u+l) 




( 


F8 


p[V-u+ll.SetX(xu). p[V.u+l].SetY(yu) // Lower avatar 




1 


F9 


p[v].SctX(0.0).prvl.SetY(-LO) // ...The lower avatar 




RETURN 














H 


ComputeTurnAnKles(x) 


HI 


for(i=l;j<=U;i++) { 


H2 


for (i=0;i<=V;i++) { // Angle between j looking at i and i+1... 


H3 


if(i==i II j=((i+l)%N)) 


H4 


tum(i-la) s= 999 // an arbitrarily large number 




else 




( 


H5 


vji = p[i] - pO] 


H6 


vji.Normalise() 


H7 


viipl=p[a+l)%Nl-p[il 


H8 


vjipLNormaliseO 


H9 


tau = ArcCos(yji.DotProduct(vjipl) 


HIO 


If (parallax) // scale the angle if apparent turn is required 


Hll 


Tau *= L ScalcFromX (p[il.XO) 


H12 


tum(j-l,i) = tau 




J 




) 




) 




RETURN 






I 


FindCriticalTumO: 


11 


CritTum=999.0 // Set to an arbitrary large number 


12 


ForG=l;j<=U; 


13 


For (i=0;i<=V;i++) // Angle between i looking at i and i+1... 


14 


if (turnO-l.i)<critTum) { 


IS 


CritTum = tum(i-l,i) 




RETURN critTum 






J 


Y(u,xx) 


Jl 


RETURN K ProiPosn(u)*( K + xx ) / K 






K 


ProjPosn(u) // Caluclates the y-coordinate of the projection of the u"* avatar 


Kl 


gap = 2/(V- 1 ) // gap is the distance between the projections of the avatars 


K2 


RETURN 1.0-(u-l)*gap 






L 


ScaleFromX(xx) 


LI 


Return K/(K+xx); 
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CLAIMS 



1 . A computer conferencing system operable to carry out a conference by 
animating three-dimensional computer models of the participants in 

5 dependence upon real-world movements thereof, the system comprising a 

plurality of user stations arranged to generate and exchange data so that each 
user station displays a sequence of images of a respective tiu-ee-dimensional 
computer model containing three-dimensional computer models of the 
participants at the other user stations, and such that movements of at least the 

10 heads of the participants in real-life produce corresponding movements of the 

three-dimensional computer models, wherein each user station comprises: 

storage means storing data defining a three-dimensional conference 
computer model containing a three-dimensional computer model of each 
participant at the other user stations, the three-dimensional conference 

15 computer model being different firom tiie three-dimensional conference 

computer model stored at each of the other user stations in the system; 

means for generating and displaying images of the three-dimensional 
conference computer model, wherein the content of the displayed images is 
independent of the movement of the head of each viewing participant; 

20 means for determining and outputting the position in the displayed 

images at which each participant at the user station is looking; and 

processing means for moving at least the head of the three-dimensional 
computer model of each participant in dependence upon data received from the 
other participants, so that unages displayed at the user station convey the head 

25 movements of the participants. 
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2. A system according to claim 1, wherein: 

each user station further comprises means for recording and outputting 
image data of at least the head of each participant at the user station; and 

each user station is arranged to generate the image data for display by 
rendering the image data for a participant onto the corresponding three- 
dimensional computer model. 

3. A system according to claim 1 or claim 2, wherein the three- 
dimensional conference computer model at each user station furdier contains 
a three-dimensional computer model of a character to be animated during the 
conference but whose head movements during the animation are not 
deteimined by the head movements of a participant to the conference. 

4. Computer processing apparatus for use in a computCT conferencing 
^stem according to claim 1, comprising: 

means for storing data defining a three-dimensional conference 
computer model containing a three-dimensional computer model of each 
participant at the odier apparatus in tide syston; 

means for generating image data for the display of images of the three- 
dimensional conference computer model, such that the content of the displayed 
images will be independent of movements of the heads of viewing participants; 

means for determining and outputting the position in a displayed image 
at which each participant at the user station is looking; and 

processing means for moving at least the head of the three-dimensional 
computer model of each participant in dependence upon data received from the 
other participants so that the displayed images will convey the head 
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movements of the participants. 



5. Apparatus according to claim 4, wherein: 

the apparatus further comprises means for outputting image data of the 
5 head of each participant at the apparatus; and 

the apparatus is arranged to generate the image data for display by 
rendering received image data for a participant onto the corresponding three- 
dimensional computer model. 

10 6. Apparatus according to claim 4 or claim S, further comprising means 

for generating data defining the three-dimensional conference computer model 
in accordance with a seating plan of the participants. 



7. Apparatus according to claim 6, wherein: 

15 the means for generating the data defining die three-dimensional 

conference computer model is arranged to determine positions in the three- 
dimensional conference computer model for the three-dimensional computer 
models of the participants in dependence upon the seating plan, the widtii of 
the display on which the images of tiie three-dimensional conference computer 

20 model are to be displayed and a distance of a viewing participant fi'om the 

display; and 

the means for generating the image data for display is arranged to 
generate the image data by rendering the three-dimensional conference 
computer model from a position defined by the distance of the viewing 
25 participant from the display used to generate tiie data defining the three- 

dimensional conference computer model. 
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8. Apparatus according to claim 7, wherein the means for generating the 
data defining the three-dimensional conference computer model and the means 
for generating the image data for display are arranged to change the positions 
in the three-dimensional conference computer model of at least one of the 
three-dimensional computer models of the participants and the position from 
which the three-dimensional conference computer model is rendered to 
generate the image data for display as the distance of the viewing participant 
from the display changes during the conference. 

9. Apparatus according to claim 7 or claim 8, wherein the means for 
generating the datadefining the three-dimensional conference computer model 
is arranged to determine the positions for the three-dimensional computer 
models of the participants such that, in the images displayed to the viewing 
participant, the three-dimensional computer models of the participants are 
substantially equally spaced across the display and the minim um movement 
which the head of a three-dimensional computer model of a participant 
undergoes to look from one participaat to another is maximised. 

10. Apparatus according to any of claims 4 to 9, ftirther comprising means 
for generating and outputting data defining movements of at least one body 
part other than the head of each viewing participant. 

1 1 . Apparatus according to claim 10, wherein tiie means for generating the 
data defining movements is arranged to generate data defining the three- 
dimensional positions of discrete points on each viewing participant. 



. 86 

12. Apparatus according to claim 10 or claim 1 1, wherein the means for 

^^AAWJL CAkAXA^ '«■**"•■ " ■ - -p ■ ]^ iT - " *-/ 

signals defining images of each viewing participant to generate the data 
defining the movements. 

5 

13. Apparatus according to claim 12, wherein the means for generating the 
data defining movements comprises means for processing image data from a 
plurality of cameras to generate the data defining the movements. 

10 14. Apparatus according to claim 13, wherein tibie means for generating the 

data defining movements comprises means for matching feature points in 
images firom respective cameras to generate the data defining the movements. 

15. Apparatus according to claim 14, wherein the feature points comprise 
15 at least one of lights and coloured markers. 

16. Apparatus according to any of claims 4 to 15, wherein the means for 
detennining the position in a displayed image at which a participant is looking 
comprises means for generating data defining the position relative to the 

20 participants displayed in the image. 

17. Apparatus according to any of claims 4 to 16, wherein the means for 
determining the position in a displayed image at which a participant is looking 
comprises means for processing signals defining images of the participant to 

25 generate the data defining the position. 
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18. Apparatus according to claim 17, wherein the means for determining 
the position in a displayed image at which a participant is looking is arranged 
to detennine the position in dependence upon the position of the paxticipanf s 
head. 

5 

19. Apparatus according to claim 18, wherein the means for determining 
flie position in a displayed image at which a participant is looking is arranged 
to determine Hic position by determining a plane representing the position of 
the participant's head and projecting a line from the plane to the displayed 

10 image. 

20. Apparatus according to any of claims 4 to 19, further comprising 
calibration means for performing calibration processing to determine the 
position of a display screen on which the image data will be displayed. 

15 

21. Apparatus according to claim 20, wherein the calibration means is 
arranged to determine the position of the display screen by determining the 
plane of the display screen and determining the position of the plane in three 
dimensions. 

20 

22. y^paratus according to claim 21, whCTcin flie calibration means is 
arranged to determine the plane of the display screen and the position of ttie 
plane in dependence upon the configuration of the participant's head when 
looking at known positions on tire display screen. 

25 



23. Apparatus according to any of claims 4 to 22, furtfaor comprising 



88 

display means for displaying the image data. 

24. Apparatus for comiection to a plurality of corresponding apparatus to 
carry out a virtual meeting by animating participant avatars in dependence 
upon movements of the real participants, wherein the apparatus is arranged to 
store and animate a 3D computer model of the meeting which is different to 
the 3D computer model stored at the corresponding apparatus. 

25. A method of carrying out a computer conference by animating three- 
dimensional computer models of the participants in dependence upon real- 
world movements thereof, wherein data is exchanged between a plurality of 
user stations so that each user station displays a sequence of images of a 
respective three-dimensional computer model containing three-dimensional 
computer models of the participants at the other user stations, and such that at 
least head movements of the participants in real-life produce corresponding 
movements of the three-dimensional computer models, wherein each user 
station is operated such that: 

data is stored defining a three-dimensional conference computer model 
containing a three-dimensional computer model of each participant at the other 
user stations, the three-dimensional conference computer model being 
different from the three-dimensional conference computer models stored at 
each of the other user stations in the system; 

images are generated and displayed of the three-dimensional conference 
computer model, wherein the content of the displayed images is independent 
of the movement of the head of each viewing participant; 

the position in the displayed images at which each participant at the 
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user station is looking is determined and output; and 

at least the heads of the three-dimensional computer models of the 
participants are moved in dependence upon data received from die other 
participants, so that images displayed at the user station convey the movements 
of the participants. 

26. A method according to claim 25, wherein, at each user station: 
image data of at least the head of each participant at the user station is 

recorded and output; and 

the image data for display is generated by rendering the image data for 
a participant onto the corresponding three-dimensional computer model. 

27. A method according to claim 25 or claim 26, wherein the three- 
dimensional conference computer model at each user station further contains 
a three-dimensional computer model of a character to be animated during the 
conference but whose head movements during the animation are not 
determined by the head movements of a participant to the conference. 

28 . A method of operating a computer processing apparatus in a computer 
conferencing system to carry out a conference between participants at a 
plurality of apparatus, comprising: 

storing data defining a three-dimensional conference computer model 
containing a three-dimensional computer model of each participant at the other 
apparatus; 

generating image data for the display of images of the three- 
dimensional conference computer model, such that the content ofthe displayed 
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images will be independent of movements of the heads of viewing participants; 

dctexxiuiiixig 'oXid Gii^^uttiiig thc position in a displajT'cd linage at whicli: 
each participant at the user station is looking; and 

moving at least the heads of the three-dimensional computer models of 
5 the participants in dependence upon data received from the other participants 

so that the displayed images will convey the head movements of tihe 
participants. 



29. A method according to claim 28, wherein: 

10 the method further comprises recording and outputting image data of 

the head of each participant at the apparatus; and 

the image data for display is generated by rendering received image 
data for a participant onto the corresponding three-dimensional computer 
model. 

15 

30. A method according to claim 28 or claim 29, further comprising the 
step of generating data defining the three-dimensional conference computer 
model in accordance with a seating plan of the participants. 



20 31. A method according to claim 30, wherein: 

in the step of generating the data defining the three-dimensional 
conference computer model, positions in the three-dimensional conference 
computer model for the three-dimensional computer models of the participants 
are determined in dependence upon the seating plan, the width of the display 

25 on which the images of the three-dimensional conference computer model are 

to be displayed and a distance of a viewing participant from the display; and 
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in the step of generating the image data for display, the image data is 
generated by rendering the tihree*dimensional conference computer model from 
a position dejgned by the distance of the viewing participant from the display 
used to generate the data defining the three-dimensional conference computer 
5 model. 

32. A method according to claim 3 1, wherein, in the steps of generating the 
data defining the three-dimensional conference computer model and generating 
the image data for display, the positions in the three-dimensional conference 

10 computer model of at least one of the three-dimensional computer models of 

tiie participants and the position from which the three-dimensional conference 
computer model is rendered to generate the image data for display are changed 
as the distance of the viewing participant from the display changes during the 
conference. 

15 

33. A method according to claim 3 1 or claim 32, wherein, in the step of 
generating the data defining the three-dimensional conference computer 
model, the positions for the three-dimensional computer models of the 
participants are determined such that, in the images displayed to the viewing 

20 participant, the three-dimensional computer models of the participants are 

substantially equally spaced across the display and the minimum movement 
which the head of a three-dimensional computer model of a participant 
undergoes to look from one participant to another is maximised. 



25 



34. A method according to any of claims 28 to 33, further comprising the 
step of generating and outputting data defining movements of at least one body 
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part other than the head of each viewing participant. 

35 . A method according to claim 34, wherein the data defining movements 
is data defining the three-dimensional positions of discrete points. 

5 

36. A method according to claim 34 or claim 35, wherein the data defining 
movements is generated by processing signals defining images. 

37. A method according to claim 36, wherein the data defining movements 
10 is generated by processing image data from a plurality of cameras. 

38. A method according to claim 37, wherein the data defining movements 
is generated by matching feature points in images firom respective cameras. 

15 39. A metiiod according to claim 38, wherein the feature points comprise 

at least one of lights and coloured markers. 

40. A method according to any of claims 28 to 39, wherein the data 
defiining the position in a displayed image at which a participant is looking 

20 comprises data defining the position relative to the participants displayed in 

the image. 

41. A method according to any of claims 28 to 40, wherein the position in 
a displayed image at which a participant is looking is generated by processing 

25 signals defining images of the participant to generate tibie data defining the 

position. 
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42. A method according to claim 41, wherein the position in a displayed 
image at which a participant is looking is determined in dependence upon the 
position of the participant's head. 

5 43. A method according to claim 42, wherein the position in a displayed 

image at which a participant is looking is determined by determining a plane 
representing the position of the participant's head and projecting a line from 
the plane to the displayed image. 

10 44. A method according to any of claims 28 to 43, further comprising a 

calibration step of performing processing to determine tiie position of a display 
screen on which the image data will be displayed. 

45. A method according to claim 44, wherein the calibration step 
15 determines the position of the display screen by determining the plane of the 

display screen and determining the position of the plane in three dimensions. 

46. A method according to claim 45, wherein the calibration step 
determines the plane of the display screen and the position of the plane in 

20 dependence upon the configuration of the participant's head when looking at 

known positions on the display screen. 

47. A method according to any of claims 28 to 46, further comprising a step 
of displaying the image data. 

25 

48. A method of operating a computer processing apparatus to carry out a 



94 

virtual meeting by animating participant avatars in dependence upon 
^ — «f thf^ rpnl narticinants. wherein movements of at least the 
participants' heads produce corresponding movements of the avatars in a three- 
dimensional computer model of the conference which is different to the three- 
dimensional computer model of the conference at each other computer 
processing apparatus participating in the conference. 

49. A storage medium storing computer-useable instructions, which, when 
loaded into a programmable computer processing apparatiis, enable the 
apparatus to become configured as an apparatus according to at least one of 
claims 4 to 24. 

50. A signal conveying conq)uter-useable instructions, which, when loaded 
into a programmable computer processing apparatus, enable the apparatus to 
become configured as an apparatiis according to at least one of claims 4 to 24. 

51. A storage medium storing computer-useable instructions, which, when 
loaded into a programmable computer processing apparatus, enable the 
apparatus to become operable to perform a method according to at least one 
of claims 28 to 48. 

52. A signal conveying computer-useable instructions, which, when loaded 
into a programmable computer processing apparatus, enable the apparatiis to 
become operable to perform a method according to at least one of claims 28 
to 48. 
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